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delecting 0-pn>lein coupled 
receptor (GPCR) activity; methods 
for absaying GPCR acUvily; 
and methods for screening for 
GPCR ligands, G-protein-coupled 
receptor kinase (GRK) activity, 
and compounds that interact with 
components of the GPCR regulatory 
process are described. Included 
are methods for expanding ICAST 
technologies for assaying GPCR 
activity with applications for ligand 
6shing, and agoiiist or antagonist 
screening. These methods include: 
engineering seronine/tlireonine 
phosphorylation sites into known or 
orphan GPCR open reading frames 
in order to increase the affinity of 
arrestin for the activated form of the 
GPCR or to increase the reside time 
of arrestin on the activated GPCR; 
engineering mutant arrestin proteins 
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that bind to activated GPCRs in the absence of G-protein coupled receptor kinases which may be limiting; and engineering mutant 
super anxstin proteins that have an increased affinity for activated GPCRs with or without phosphorylation. These methods arc 
intended to increase tiie robustness of the GPCR/ICAST technology in situations in which G-protcin coupled receptor kinases are 
absent or limiting, or in which the GPCR is not efficiently down-rcgulaicd or is rapidly rcscnsitizcd (thus having a labile interaction 
with arrestin). Included are also more specific methods for using ICAST complementary enzyme fragments to monitor GPCR 
homo- and hetero- dimerization with applications for drug lead discovery and ligand and function discovery for orphan GPCRs. 
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TITLE OF THE TNVRNTTQN 

IMPROVED SYSTEMS FOR SENSITIVE DETECTION OF G-PROTEIN 
COUPLED RECEPTOR AND ORPHAN RECEPTOR FUNCTION 
USING REPORTER ENZYME MUTANT COMPLEMENTATION 

BACKGROUND OF THE INVENTION 
This application is a continuation-in-part of U.S. Application Serial No. 
09/654,499, filed September 1, 2000, which claims the benefit firom Provisional 
Application Serial No. 60/180,669, j&led February 7, 2000. The entirety of U.S. 
5 Application Serial No, 09/654,499 and Provisional Application Serial No. 
60/180,669 are incorporated herein by reference. 

Field of the Invention 

The present invention relates to methods of detecting G-protein-coupled 
1 0 receptor (GPCR) activity, and provides methods of assaymg GPCR activity, 

methods for screening for GPCR ligands, agonists and/or antagonists, methods for 
screening natural and surrogate ligands for orphan GPCRs, and methods for 
screening compounds that interact with components of the GPCR regulatory 
process. 

15 

Background of the Tcchnoiogv 

The actions of many extracellular signals are mediated by the interaction of 
G-proteui- coupled receptors (GPCRs) and guanine nucleotide-binding regulatory 
proteins (G-proteins). G-protein-mediated signaling systems have been identified in 
20 many divergent organisms, such as mammals and yeast The GPCRs represent a 

-1- 
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large super family of proteins which have divergent amino acid sequences, but 
share conmion structural features, in particular, the presence of seven 
transmembrane helical domains. GPCRs respond to, among other extracellular 
signals, neurotransmitters, hormones, odorants and light Individual GPCR types 
5 activate a particular signal transduction pathway; at least ten different signal 

transduction pathways are known to be activated via GPCRs. For example, the 
beta 2-adrenergic receptor (P2AR) is a prototype mammalian GPCR. In response 
to agonist binding, P2AR receptors activate a G-protein (Gs) which in turn 
stimulates adenylate cyclase activity and results in increased cyclic adenosine 
1 0 monophosphate (cAMP) production in the cell 

The signaling pathway and final cellular response that result from GPCR 
stimulation depends on the specific class of G-protein with which the particular 
receptor is coupled fHanmi. "The Many Faces of G-Protein Signaling." J. Biol. 
Chem., 273:669-672 (1998)). For instance, coupling to the Gs class of G-proteins 
15 stimulates cAMP production and activation of the Protein Kinase A and C 

pathways, whereas coupling to the Gi class of G-proteins down regulates cAMP. 
Other second messenger systems such as calcium, phospholipase C, and 
phosphatidylinositol 3 may also be utilized. As a consequence, GPCR signaling 
events have predominantly been measured via quantification of these second 
20 messenger products. 

The decrease of a response to a persistent stimulus is a widespread 
biological phenomenon. Signaling by diverse GPCRs is believed to be teniiinated 
by a uniform two-step mechanism. Activated receptor is first phosphorylated by a 
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GPCR kinase (GRK). An arrestin protein binds to the activated and 
phosphorylated receptor, thus blocking G-protein interaction. This process is 
commonly referred to as desensitization, a general mechanism that has been 
demonstrated in a variety of functionally diverse GPCRs. Arrestin also plays a part 
5 in regulating GPCR internalization and resensitization, processes that are 

heterogenous among different GPCRs fOaklev. etal.. J. BioL Chem., 274:32248- 
32257 (1999)). The interaction between an arrestin and GPCR in processes of 
internalization and resensitization is dictated by the specific sequence motif in the 
carboxyl terminxis of a given GPCR, Only a subset of GPCRs, which possess 
1 0 clusters of three serine or threonine residues at the carboxyl termini, were found to 
co-traffick with the arrestins into the endocytic vesicles after ligand stimulation. 
The number of receptor kinases and arrestins involved in desensitization of GPCRs 
is rather limited, 

A common feature of GPCR physiology is desensitization and recycling of 
15 the receptor through the processes of receptor phosphorylation, endocytosis and 

dephosphorvlation (Ferguson, et al„ "G-protein*coupled receptor regulation: role of 
G-proteui-coupled receptor kinases and arrestins." Can. J. Physiol. Pharmacol., 
74:1095-1110 (1996)). Ligand-occupied GPCRs can be phosphorylated by two 
families of serine/threonine kinases, the G-protein-coupled receptor kinases 
20 (GRKs) and the second messenger-dependent protein kinases such as protein 

kinase A and protein kinase C. Phosphorylation by either class of kinases serves to 
down-regulate the receptor by uncoupling it from its corresponding G-proteiiL 
GRK-phosphorylation also serves to down-regulate the receptor by recruitment of a 



wo 01/58923 



PCT/USOl/00684 



class of proteins known as the airestins that bind the cytoplasmic domain of the 
receptor and promote clustering of the receptor into endocytic vescicles. Once the 
receptor is endocytosed, it will either be degraded in lysosomes of 
dephosphorylated and recycled back to the plasma membrane as a fully-functional 
5 receptor. 

Binding of an airestin protein to an activated receptor has been documented 
as a common phenomaion of a variety of GPCRs ranging from rhodopsin to P2AR 
to the neurotensin receptor (Barak, et al.. "A P-arrestin/Green Fluorescent Fusion 
Protein Biosensor for Detecting G-Protein-Coupled Receptor Activation," J, Biol. 

10 Chem., 272:27497-500 (1997)). Consequently, monitoring arrestin interaction with 
a specific GPCR can be utilized as a generic tool for measuring GPCR activation. 
Similarly, a single G-protein and GRK also partner with a variety of receptors 
(Hamm. et al (1998) and Pitcher et al. "G-Protein-Coupled Receptor Kinases," 
Annu. Rev. Biochem., 67:653-92 (1998)), such that these protein/protein 

1 5 interactions may also be monitored to determine receptor activity. 

Many therapeutic drugs in use today target GPCRs, as they regulate vital 
physiological responses, including vasodilation, heart rate, bronchodilation, 
endocrine secretion and gut peristalsis. See, e.g.. Lefkowitz et al.. Annu. Rev, 
Biochem., 52:1 59 (1 983). Some of these drugs mimic the ligand for this receptor. 

20 Other drugs act to antagonize the receptor in cases when disease arises from 
spontaneous activity of the receptor. 
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Efforts such as the Human Genome Project are identifying new GPCRs 
("orphan" receptors) whose physiological roles and ligands are unknown. It is 
estimated that several thousand GPCRs exist in the human genome. 

Various approaches have been used to monitor intracellidar activity in 
5 response to a stimulant, enzyme-linked immunosorbent assay (EUS A); 
Fluorescense Imaging Plate Reader assay OPLIPR™, Molecular Devices Corp., 
Sunnyvale, CA); EVOscreen™, EVOTEC™, Evotec Biosystems Gmbh, Hamburg, 
Gennany; and techniques developed by CELLOMICS™ Cellomics, Inc., 
Pittsburgh, PA. 

1 0 Germino et al- "Screening for in vivo protein-protein interactions." Proc, 

Natl. Acad. Sci., 90(3):933-937 (1993), discloses an in vivo approach for the 
isolation of proteins interacting with a protein of interest. 

Phizickvet al.. "Protein-protem interactions: methods for detection and 
analysis." Microbiol. Rev., 59(1): 94-123 (1995), discloses a review of 
15 biochemical, molecular biological and genetic methods used to study protein- 
protein interactions. 

.Qfferoann? et al„ "Ga,3 and Ga,6 Couple a Wide Variety of Receptors to 
Phospholipase C." J, Biol. Chem., 270(25): 15 175-1 5 180 (1995), discloses that 
Guij and Ga,6 can be activated by a wide variety of G-protein-coupled receptors, 
20 The selective coupling of an activated receptor to a distinct pattern of G-proteins is 
regarded as an important requirement to achieve accurate signal transductioxL Id. 

Barak et ah. "A P-arrestin/Green Fluorescent Protein Biosensor for 
Detecting G Protein-coupled Receptor Activation." J. Biol. Chem., 272(44):27497- 
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27500 (1997) and U.S. Patents Nos. 5,891,646 and 6,110,693 disclose the use of a 
P-arrestin/green fluorescent fusion protein (GFP) for imaging protein translocation 
upon stimulation of GPCR with optical devices. 

Each of the references described above has drawbacks. For example, 

• The prior art methodologies require over-expression of the proteins, 
which could cause artifact and tip the balance of cellular regulatory 
machineries. 

• The prior art visualization or imaging assays are low throughput and 
lack thorough qxiantification. Therefore, they are not suitable for 
high throughput pharmacological and kinetic assays. 

In addition, many of the prior art assays require isolation of the QPCR rather than 
observation of the GPCR in a cell There thus exists a need for improved methods 
for monitoring GPCR function- 

15 SUMMARY OF THE INVENTION 

The present invention provides modifications to the disclosure in U.S. 
Application Serial No. 09/654,499. In particular, the present invention is directed 
to modifications of the below aspects of the invention to further enhance assay 
sensitivity. The modifications include the use of genetically modified arrestins that 

20 exhibit enhanced binding to activated GPCR regardless of whether the GPCR is 
phosphorylated or non-phosphorylated; the use of a serine/threonine cluster 
strategy to fecilitate screening assays for orphan receptors that do not possess this 
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Structural motif on their own; and the use of a combination of the above 
modifications to achieve even more enhanced detection. 

A first aspect of the present invention is a method that monitors GPCR 
function proximally at the site of receptor activation, thus providing more 
5 information for drug discovery purposes due to fewer competing mechanisms. 
Activation of the GPCR is measured by a read-out for interaction of the receptor 
with a regulatory component such as arrestin, G-protein, GRK or other kinases, the 
binding of which to the receptor is dependent upon agonist occupation of the 
receptor. The present invention involves the detection of protein/protein 

1 0 interaction by complementation of mutant reporter enzymes. 

Binding of arrestin to activated GPCR is a common process in the first step 
of desensitization that has been demonstrated for most, if not all, GPCRs studied so 
far. Measurement of GPCR interaction with arrestin via mutant enzyme 
complementation (i.e., ICAST) provides a more generic assay technology 

1 5 applicable for a wide variety of GPCRs and orphan receptors. 

A fiirther aspect of the present invention is a method of assessing GPCR 
pathway activity mder test conditions by providing a test cell that e5q}resses a 
GPCR, e.g., muscarinic, adrenergic, dopamine, angiotensin or endothelin, as a 
fiision protein to a mutant reporter enzyme and iateracting a protein in the GPCR 

20 pathway, e.g., G-protein, arrestin or GRK, as a fusion protein with a 

complementing mutant reporter enzyme. When test cells are exposed to a known 
agonist to the target GPCR under test conditions, activation of the GPCR will be 
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monitored by complementation of the reporter enzyme. Increased reporter enzyme 
activity reJQects interaction of the GPCR with its interacting protein partner. 

A further aspect of the present invention is a method of assessing GPCR 
pathway activity in the presence of a test arrestin, e.g.. P-arrestin. 
5 A further aspect of the present invention is a method of assessing GPCR 

pathway activity in the presence of a test G-protein, 

A further aspect of the present invention is a method of assessing GPCR 
pathway activity upon exposure of the test cell to a test ligand. 

A further aspect of the present invention is a method of assessing GPCR 
1 0 activity upon co-expression in the test cell of a second receptor. The second 

receptor could be the same GPCR or orphan receptor (i.e.. homo-dimerization), a 
different GPCR or orphan receptor (i.e., hetero-dimerization) or could be a receptor 
of another type. 

A further aspect of the present invention is a method for screening for a 
15 hgand or agonist to an oiphan GPCR. The ligand or agonist could be contained in 
natural or synthetic libraries or mixtures or could be a physical stimulus. A test 
cell is provided that expresses the orphan GPCR as a fusion protein with a mutant 
reporter enzyme, e.g.. a p-galactosidase mutant, and, for example, an arrestin or 
mutant form of arrestin as a fusion protein with a complementing mutant reporter 
20 enzyme, e.g., another P-galactosidase mutant. The interaction of the arrestin with 
the orphan GPCR upon receptor activation is measured by enzymatic activity of the 
complemented reporter enzyme. The test ceU is exposed to a test compoimd, and 
an increase in reporter enzyme activity indicates the presence of a ligand or agonist. 
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A further aspect of the present invention is a method for screening a 
protein of interest, for example, an arrestin protein (or mutant form of the arrestin 
protein) for the ability to bind to aphosphorylated, or activated, GPCR. A test cell 
is provided that expresses a GPCR as a fusion protein with a mutant reporter 
5 enzyme, e.g., a P-galactosidase mutant, and contains arrestin (or a mutant form of 
arrestin) as a fusion protein with a complementing mutant reporter enzyme, e.g., 
another P-galactosidase mutant. The interaction of arrestin with the GPC31 upon 
receptor activation is measured by enzymatic activity of the complemented reporter 
enzyme. The test cell is exposed to a known GPCR agonist and then reporter 
1 0 enzyme activity is detected. Increased reporter enzyme activity indicates that the 
P-arrestin molecule can bind to phosphorylated, or activated, GPCR in the test cell. 

A further aspect of the present invention is a method to screen for an 
agonist to a specific GPCR. The agonist could be contained in natural or synthetic 
libraries or could be a physical stimulus. A test ceD is provided that expresses a 
15 GPCR as a fusion protein with a mutant reporter enzyme, e.g., a P-galactosidase 
mutant, and, for example, an arrestin as a fusion protein with a complementing 
mutant reporter enzyme, e.g., another P-galactosidase mutant. The interaction of 
arrestin with the GPCR upon receptor activation is measured by enzymatic activity 
of the complemented reporter enzyme. The test cell is exposed to a test compound, 
20 and an increase in reporter enzyme activity indicates the presence of an agonist. 
The test cell may express a knovm GPCR or a variety of known GPCRs, or may 
express an unknown GPCR or a variety of unknown GPCRs. The GPCR may be, 
for example, an odorant GPCR or a PAR GPCR. 
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A further aspect of the present invention is a method for screening a test 
compound for GPCR antagonist activity. A test cell is provided that expresses a 
GPCR as a fusion protein with a mutant reporter enzyme, e.g.. a p-galactosidase 
mutant, and, for example, an airestin as a fusion protein with a complementing 
5 mutant reporter enzyme, e^ another P-galactosidase mutant. The interaction of 
arrestin with the GPCR upon receptor activation is measured by enzymatic activity 
of the complemented reporter enzyme. The test cell is exposed to a test compound, 
and an increase in reporter enzyme activity indicates the presence of an agonist. 
The cell is exposed to a test compound and to a GPCR agonist, and reporter 
1 0 enzyme activity is detected. When exposure to the agonist occurs at the same time 
as or subsequent to exposure to the test compound, a decrease in reporter enzyme 
activity after exposure to the test compound indicates that the test compound has 
antagonist activity to the GPCR, 

A further aspect of the present invention is a method of screening a sample 
15 solution for the presence of an agonist, antagonist or ligand to a GPCR. A test cell 
is provided that expresses GPCR as a fusion protein with a mutant reporter 
enzyme, e.g.. a P-galactosidase mutant, and contains, for example, a p-arrestin as a 
fusion protein v/ith a complementing reporter, e.g.. another P-galactosidase mutant. 
The test cell is exposed to a sample solution, and reporter enzyme activity is 
20 assessed. Changed reporter enzyme activity after exposure to the sample solution 
indicates the sample solution contains an agonist, antagonist or ligand for a GPCR 
expressed in the cell. 
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A further aspect of the present invention is a method of screening a cell for 
the presence of a GPCR. According to this aspect, an arrestin fusion protein with a 
mutant reporter enzyme and a GPCR downstream signaling fusion protein with a 
mutant reporter enzyme are employed to detect GPCR action. A modification of 
this aspect of the invention can be employed to provide a method of screening a 
plurality of cells for those ceUs which contain a GPCR. According to this aspect, a * 
plurality of cells containing a conjugate comprising a P-airestin protein as a fusion 
protein with a reporter enzyme are provided; the plurality of cells are exposed to a 
GPCR agonist; and activity of reporter enzyme activity is detected. An increase in 
reporter enzymatic activity after exposure to the GPCR agonist mdicatcs P-arrestin 
protein binding to a GPCR, thereby indicating that the ceU contains a GPCR 
responsive to the GPCR agonist. 

A further aspect of the invention is a method for mapping GPCR-mediated 
signaling pathways. For instance, the system could be utilized to monitor 
interaction of c-src with p-arrestin-l upon GPCR activation. Additionally, the 
system could be used to monitor protein/protein interactions involved in cross-talk 
between GPCR signaling pathways and other pathways such as that of the receptor 
tyrosine kinases or Ras/Ra£. According to this aspect, a test cell is provided that 
expresses a GPCR or other related protein with a mutant reporter enzyme, §^ a p- 
galactosidase mutant, and contains a protein from another pathway as a fusion 
protein with a complementing mutant reporter enzyme, e^ another p- 
galactosidase mutant. Increased reporter enzymatic activity indicates 
protein/protein interaction. 
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A further aspect of the invention is a method for monitoring homo- or 
hetero- dimerization of GPCRs upon agonist or antagonist stimulation. Increasing 
evidence indicates that GPCR dimerization is important for biological activity 
(AbdAUa, et al.. "ATI -receptor heterodimers show enhanced G-protein activation 
5 and altered receptor sequestration." Nature, 407:94-98 (2000); Bockaert, et al.. 
"Molecular tinkering of G protein-coupled receptors: an evolutionary success." 
EMBO J. 18:1723-29 (1999)). Jordan, et al„ "G-protein-coupled receptor 
heterodimerization modulates recq)tor function." Nature, 399:697-700 (1999), 
demonstrated that two non-fiinctiona] opioid receptors, k and S, heterodimerize to 

10 form a functional receptor. Gordon et al.. "Dopamine D2 receptor dimers and 

receptor blocking peptides." Bioch. Biophys. Res. Commun. 227:200-204 (1996), 
showed different pharmacological properties associated with the monomeric and 
dimeric forms of Dopamine recqjtor D2. The D2 receptors exist either as . 
monomers that are selective targets for spiperone or as dimer forms that are targets 

1 5 for nemonapride. Herbert, et al., "A peptide derived from a p2-adrenergic receptor 
transmembrane domain inhibits both receptor dimerization and activation." J.B.C. 
271:16384-92 (1996), demonstrated that the agonist stimulation was found to 
stabilize the dimeric state of the receptor, whereas inverse agonists favored the 
monomeric form. Indeed, the same study showed that a peptide corresponding to 

20 the sixth transmembrane domain of the p2-adrenergic receptor inhibited both 

receptor dimerization and activation. Further, Anpers et al.. Detection of beta-2- 
adrenergic receptor dimerization in living cells using bioluminescence resonance 
energy transfer, Proc. Natl. Acad. Sci. USA, 97(7):3684-3689, discloses the use of 
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. P2-adrenergic receptor fusion proteins (i^, p2-adrenergic receptor fused to 
luciferase and p2-adrenergic receptor fused to an enhanced red-shifled green 
fluorescent protein) to study p2-adrenergic receptor dimerization. 

GPCR dimerization in the context of cellular physiology and 
5 pharmacology can be monitored in accordance with the invention. For example, p- 
galactosidase complementation can be measured in test cells that co-express GPCR 
fusion proteins of P-galactosidase mutant enzymes, e.p.. GPCRiAa and GPCR2Aa) 
(FIGURE 27), According to this aspect, the mterconversion between monomeric 
to dimeric forms of the GPCRs or orphan receptors can be measured by mutant 

1 0 reporter enzyme complementation. FIGURE 27 illustrates a test cell co-expressing 
GPCR or an orphan receptor as a fusion protein with Aa form of p-galactosidase 
mutant (e.g.. GPCRiAa), and the same GPCR or orphan receptor as a fusion 
protein with Ag) form of P-galactosidase mutant (e.g.. GPCRiAca). Formation of 
the GPCR homodimcr is reflected by formation of an active enzyme, which can be 

15 measured by enzyme activity assays, such as the Gal-Screen^ assay. Similarly, 

hetero-dimerization between two distinct GPCRs, or two distinct orphan receptors, 
or between one known GPCR and one orphan receptor can be analyzed in test cells 
co-expressing two fusion proteins, e.g.. GPCRi Aa and GPCR2A0). The increased 
P-galactosidase activity indicates that the two receptors can form a heterodimer. 

20 A further aspect of the invention is a method of monitoring the 

interconversion between the monomeric and dimeric form of GPCRs under the 
influence of agonist or antagonist treatment. The test receptor(s) can be between 
the same GPCR or orphan receptor (homodimer), or between two distinct GPCRs 
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or oiphan receptors (heterodimer). The increased P-galactosidase activity after 
treatment with a compound means that the compound binds to and/or stabilizes the 
dimeric form of the receptor. The decreased p-galactosidase activity after 
treatment with a compound means that the compound binds to and/or stabilizes the 
5 monomeric form of the receptor. 

A fijrther aspect of the mvention is a method of screening a cell for the 
presence of a GPCR responsive to a GPCR agonist A cell is provided that 
contains protein partners that interact downstream in the GPCR*s pathway. The 
protein partners are expressed as ftision proteins to the mutant, complementing 

10 enzyme and are used to monitor activation of the GPCR, The cell is exposed to a 
GPCR agonist and then enzymatic activity of the reporter enzyme is detected. 
Increased reporter enzyme activity indicates that the cell contains a GPCR 
responsive to the agonist. 

* The present invention involves the use of a combination of proprietary 

1 5 technologies (including ICAST^", Intercistronic Complementation Analysis 

Screening Technology, Gal-Screen™, etc.) to monitor protein/protein interactions 
in GPCR signaling. As disclosed in U.S. Application Serial No. 09/654,499, the 
method of the invention in part involves using ICAST™, which in turn involves 
the use of two inactive P-galactosidase mutants, each of which is fijsed with one of 

20 two interacting target protein pairs, such as a GPCR and an arrestin. The formation 
of an active P-galactosidase complex is driven by interaction of the target proteins. 
In this system, p-galactosidase activity can be detected using, e.g., the Gal- 
Screen™ assay system, wherein direct cell lysis is combined with rapid 
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ultrasensitive chemiluminescent detection of P-galactosidase reporter enzyme. 
This system uses, e^, a Galacton-aar® chemiluminescent substrate for 
measurement in a luminometer as a read out of GPGR activity. 

FIGURE 23 is a schematic depicting the use of the complementation 
5 technology in the method of the present mvention. FIGURE 23 shows two inactive 
. P-galactosidase mutants that become active when they are forced together by 
specific interactions between the fusion partners of an airestin molecule and an 
activated GPGR or orphan receptor. This assay technology will be especially 
useful in high throughput screening assays for ligand fishing for orphan receptors, a 
10 process called de-orphaning. As illustrated in FIGURE 28, a p-galactosidase 

fusion protein of an orphan receptor (e.g.. GPCR^^hanAa) is co-expressed in the test 
cell with a fusion protein of p-arrestin fe.g.. p-ArrAco). When the test cell is 
subjected to compounds, which could be natural or synthetic, the increased P- 
galactosidase activity means the compound is either a natural or surrogate ligand 
1 5 for this GPGR. The same assay system can be used to find drug leads for the new 
GPCRs. The increased P-galactosidase activity in the test cell after treatment 
indicates the agonist activity of the compound- The decreased p-galactosidase 
activity in the test cell indicates antagonist activity or inverse agonist activity of the 
compound. In addition, the method of the invention could be used to monitor 
20 GPCR-mediated signaling pathways via other dovmstream signaling components 
such as G-proteins, GRKs or the proto-oncogene c-Src. 

The invention is achieved in part by using ICAST™ protein/protein 
interaction screening to map signaling pathways. This technology is applicable to 
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a variety of known and unknown GPCRs with diverse functions. They include, but 
are not linaited to, the following sub-families of GPCRs: 

(a) receptors that bind to amine-like ligands-Acetylcholine muscarinic 
receptor (Ml to M5), alpha and beta Adrenoceptors, Dopamine receptors (Dl, D2, 

5 D3 and D4), Histamine receptors (HI and H2), Octopamine receptor and Serotonin 
receptors (5HT1. 5HT2, 5HT4, 5HT5. 5HT6, 5HT7); 

(b) receptors that bind to a peptide ligand- Angiotensin receptor, Bombesin 
receptor, Bradykinin receptor, C-C chemokine receptors (CCRl to CCR8, and 
CCRl 0), C-X-C type Chemokine receptors (CXC-R5), Cholecystokimn type A 

10 receptor, CCK type receptors, Endothelin receptor, Neurotesin receptor, FMLP- 
related receptors. Somatostatin receptors (type 1 to type 5) and Opioid receptors 
(type D, K. M, X); 

(c) receptors that bind to hormone proteins-Follic stimulating hormone 
receptor, Thyrotrophin receptor and Lutropin-choriogonadotropic hormone 

1 5 receptor; 

(d) receptors that bind to neurotransmitters-substance P receptor, 
Substance K receptor and nein-opeptide Y receptor; 

(e) Olfactory receptors-Olfactory type 1 to type 11, Gustatory and odorant 
receptors; 

20 (f) Prostanoid receptors-Prostaglandin E2 (EPl to EP4 subtypes). 

Prostacyclin and Thromboxane; 

(g) receptors that bind to metabotropic substances-Metabotropic glutamate 
group I to group III receptors; 
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(h) receptors that respond to physical stimuli, such as light, or to chemical 
stimuli,- such as taste and smell; and . 

(i) orphan GPCRs-the natural ligand to the receptor is undefined. 
Use of the ICAST''''*^ technology in combination with the invention 

5 provides many benefits to the GPCR screening process, including the ability to 
monitor protein interactions in any sub-cellular compartment-membrane, cytosol 
and nucleus; the ability to achieve a more physiologicaDy relevant model without 
requiring protein overexpression; and the ability to achieve a functional assay for 
receptor binding allowing high kiformation content. 

10 

BRIEF DESCRIPTION OF THE DRAWINGS 
FIGURE 1. Cellular expression levels of p2 adrenergic receptor (P2AR) 
and p-arrestin-2 (PArr2) m C2 clones. Quantification of p-galactosidase (P-gal) 
fusion protein was performed using antibodies against P-gal and purified p-gal 

1 5 protein in a titration curve by a standardized ELISA assay. Figure 1 A shows 

expression levels of P2AR-PgalAa clones (in expression vector pICAST ALC). 
Figure IB shows expression levels of PArr2-pgalA(D in expression vector pICAST 
0MC4 for clones 9-3, -7, -9, -10, -19 and -24, or in expression vector pICAST 
0MN4 for clones 12-4, -9, -16, -18, -22 and -24. 

20 FIGURE 2. Receptor P2AR activation was measmred by agonist-stimulated 

cAMP production. C2 cells expressing pICAST ALC p2AR (clone 5) or parental 
cells were treated with increasing concentrations of (-)isoproterenol and 0.1 mM 
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IBMX. The quantification of cAMP level was expressed as pmolAvell. 

FIGURE 3. Interaction of activated receptor P2AR and arrestin can be 
measured by p-galactosidase complementation. Figure 3A shows a time course of 
p-galactosidase activity in response to agonist (-)isoproterenol stimulation in C2 
5 expressing p2AR-PgalAa (p2AR alone, in expression vector pICAST ALC), or a 
pool of doubly transduced C2 co-expressing p2AR-pgalAa and pArr2-pgalA(D (in 
expression vectors pICAST ALC and pICAST OMC and clones isolated from the 
same pod (43-1, 43-2, 43-7 and 43-8)). Figure 3B shows a time course of p- 
galactosidase activity in response to agonist (-)isoproterenol stimulation in C2 cells 

10 expressing p2AR-PgalAa alone (in expression vector pICAST ALC) and C2 clones 
co-expressing p2AR-PgalAa and pAixl-PgalAo) (in expression vectors ICAST 
ALC and pICAST OMC). 

FIGURE 4. Agonist dose response for interaction of p2AR and arrestin can 
be measured by p-galactosidase complementation. Figure 4A shows a dose 

15 response to agonists (-)isoproterenol and procaterol in C2 cells co-expressing 
P2AR-PgalAa and pArr2-PgalAco fusion constructs. Figure 4B shows a dose 
response to agonists (-)isoproterenol and procaterol in C2 cells co-expressing 
P2AR-PgalAa and pArrl-PgalA© fusion constructs, 

FIGURE 5. Antagonist mediated inhibition of receptor activity can be 

20 measured by p-galactosidase complementation in cells co-expressing P2AR- 
PgalAa and pAir-PgalAco. Figure 5A shows specific inhibition with adrenergic 
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antagonists ICI-1 1 8,55 1 and propranolol of p-galactosidase activity in C2 clones 
co-expressing p2AR-PgalAa and P An2-PgalAco fusion constructs after incubation 
with agonist (-)isoproterenol. Figure 5B shows specific inhibition of p- 
galactosidase activity with adrenergic antagonists ICH 18,55 1 and propranolol in 
5 C2 clones co-expressing p2AR-pgalAa and pArrl-PgalAo fusion constructs in the 
presence of agonist (-)isoproterenol. 

FIGURE 6. C2 cells expressing adenosine receptor A2a show cAMP 
induction in response to agonist (CGS-21680) treatment C2 parental cells and C2 
cells co-expressing A2aR-PgaIAa and pArrl-PgalAco as a pool or as selected clones 

10 (47-2 and 47-13) were measured for agonist-induced cAMP response (pmol/well). 

FIGURE 7. Agonist stimulated cAMP response in C2 cells co-expressing 
Dopamine receptor Dl (Dl-PgalAa) and p-arrestin-2 (pArr2-PgalAG)). The clone 
expressing PArr2-PgalAQ) (Ait2 alone) was used as a negative control in the assay. 
Cells expressing Dl-PgalAa in addition to pArr2-pgalA(D responded agonist 

15 treatment (3-hydroxytyramine hydrochloride at 3 \iM). D1(PIC2) or D1(PIC3) 

designate Dl in expression vector pICAST ALC2 or pICAST ALC4, respectively. 

FIGURE 8. Variety of mammalian cell Imes can be used to generate stable 
cells for monitoring GPCR and arrestin interactions. FIGURE 8 A, FIGURE 8B and 
FIGURE 8C show the examples of HEK 293, CHO and CHW cell lines co- 

20 expressing adrenergic receptor P2AR and arrestin fusion proteins of P- 
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galactosidase mutants. The P-galactosidase activity was used to monitor agonist- 
induced interaction of P2AR and arrestin proteins. 

FIGURE 9. Beta-gal complementation can be used to monitor P2 
adrenergic receptor homo-dimerization. FIGURE 9A shows p-galactosidase 
5 activity in HEK 293 clones co-expressing p2AR-PgalAa and p2AR-pgalAa). 
FIGURE 9B shows a cAMP response to agonist (-)isoproterenol in HEK 293 
clones co-expressing p2AR-pgalAa and p2AR-pgalAco. HEK293 parental cells 
were included in the assays as negative controls. 

FIGURE 1 OA. pICAST ALC: Vector for expression of p-galAa as a C- 
10 terminal fusion to the target protein. This construct contains the following 

features: MCS, multiple cloning site for cloning the target protein in frame with the 
P-galAa; GS Linker, (GGGGS)n; NeoR, neomycin resistance gene; IRES, internal 
ribosome entry site; ColElori, origin of replication for growth in E. coli; 
5*MoMuLV LTR and 3'MoMuLV LTR, viral promoter and polyadenylation 
1 5 signals from the Moloney Murine leukemia virus. 

FIGURE 1 OB. Nucleotide sequence for pICAST ALC. 
FIGURE 1 1 A. pICAST ALN: Vector for expression of p-galAa as an N- 
terminal fusion to the target protein. This construct contains the following 
features: MCS, multiple cloning site for cloning the target protein in frame with the 
20 p-galAa; GS Linker, (GGGGS)n; NeoR, neomycin resistance gene; IRES, internal 
ribosome entry site; ColElori, origin of replication for growth in E. coli; 
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5'MoMuLV LTR and 3*MoMuLV LTR, viral promoter and polyadenylation 
signals from the Moloney Murine leukemia virus. 

FIGURE 1 IB. Nucleotide sequence for pICAST ALN. 
FIGURE 12A. pICAST OMC: Vector for expression of P-galAco as a C- 
5 terminal fusion to the target protein. This construct contains the following 

features: MCS, multiple cloning site for cloning the target protein in frame with the 
P-galA© ; GS Linker, (GGGGS)n; Hygro, hygromycin resistance gene; IRES, 
internal ribosorae entry site; ColElori, origin of replication for growth in E. coli; 
5'MoMuLV LTR and 3'MoMuLV LTR, viral promoter and polyadenylation 
10 signals from the Moloney Miuine leukemia virus. 

FIGURE 12B. Nucleotide sequence for pICAST OMC. 
FIGURE 1 3 A. pICAST OMN: Vector for expression of P-galAo as an N- 
terminal fusion to the target protein. This construct contains the following 
features: MCS, multiple cloning site for cloning the target protein in fimxe with the 
1 5 P-galAco; GS Linker, (GGGGS)n; Hygro, hygromycin resistance gene; IRES, 

internal ribosome entry site; ColElori^ origin of replication for growth in E. coli; 
5'MoMuLV LTR and 3'MoMuLV LTR, viral promoter and polyadenylation 
signals from the Moloney Murine leukemia virus. 

FIGURE 1 3B. Nucleotide sequence for pICAST OMN. 
20 FIGURE 14. pICAST ALC pArr2: Vector for expression of P-galAa as a 

C-terminal fusion to P-arrestin-2. The coding sequence of human p-arrestin-2 
(Genebank Accession Number: NM_004313) was cloned in frame to P-galAa in a 
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pICASTALC vector.' 

FIGURE 15. pICAST OMC pAiT2: Vector for expression of p-galAco as a 
C-terminal fusion to P-arrestin-2. The coding sequence of human p-arrestin-2 
(Genebank Accession Number: NM_0043 1 3) was cloned in frame to P-galAco in a 
5 pICAST OMC vector. 

FIGURE 16. pICAST ALC pAirl: Vector for expression of P-galAa as a 
C-tenninal fusion to p-arrestin-l. The coding sequence of human P-axiestin-l 
(Genebank Accession Number: NM_004041) was cloned in frame to p-galAa in a 
pICAST ALC vector. 

10 FIGURE 1 7. pICAST OMC pArrl : Vector for expression of p-galA© as a 

C-terminal fusion to p-arrestin-l. The coding sequence of human P-airestin-1 

(Genebank Accession Number: NM_004041) was cloned in frame to P-galA© in a 

pICAST OMC vector. 

FIGURE 18. pICAST ALC P2AR: Vcctorfor expression of P-galAa as a 
15 C-terminal fusion to P2 Adrenergic Receptor. The coding sequence of human P2 

Adrenergic Receptor (Genebank Accession Number: NM_000024) was cloned in 

frame to p-galAa in a pICAST ALC vector. 

FIGURE 19. pICAST OMC p2AR: Vector for expression of p-galAo as a 

C-tenninal fiasion P2 Adrenergic Receptor. The coding sequence of human P2 
20 Adrenergic Receptor (Genebank Accession Number: NM_000024) was cloned in 

frame to p-galAco in a pICAST OMC vector. 
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FIGURE 20. pICASTALCA2aR: Vector for expression of |J-galAa as a 
C-tenninal fusion to Adenosine 2a Receptor, The coding sequence of human 
Adenosine 2a Receptor (Genebank Accession Number; NM_000675) was cloned 
in frame to (J-galAa in a pICAST ALC vector. 
5 FIGURE 2 1 . pICAST OMC A2aR: Vector for expression of p-galAco as a 

C-terminal fusion to Adenosine 2a Receptor. The coding sequence of hiunan 
Adenosine 2a Receptor (Genebank Accession Number: NM_000675) was cloned 
in frame to p-galAco in a pICAST OMC vector, 

FIGURE 22. pICAST ALC Dl: Vector for expression of p-galAa as a C- 
10 terminal fusion to Dopamine Dl Receptor. The coding sequence of human 

Dopamine Dl Receptor (Genebank Accession Number: X58987) was cloned in 
frame to P-galAa in a pICAST ALC vector. 

FIGURE 23. A schematic depicting use of the complementation 
technology in the method of the invention. FIGURE 23 shows two inactive 
1 5 mutant reporter enzymes that become active when the correspondmg fusion 
partners, GPCR and p-arrestin interact 

FIGURE 24. Vector for expression of a GPCR with mserted 
seronine/threonine amino acid sequences as a fusion with p-galAa. The open 
reading frame of a known or orphan GPCR is engineered to contain additional 
20 seronine/threonine sequences, such as SSS (seronine, seronine, seronine), within 
the C-terminal tail. The engineered GPCR is cloned in frame with P-galAa in a 
pICAST ALC vector. The pICAST ALC vector contains the foUoAving features: 
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MCS, multiple cloning site for cloning the target protein in frame with the P-galAa; 
GS Linker, (GGGGS)n; NeoR, neomycin resistance gene; IKES, internal ribosome 
entry site; ColElori, origin of replication for growth in E. coli; SMoMuLV LTR 
and SMoMuLV LTR, viral promotor and polyadenylation signals from the 
5 Moloney Murine leukemia virus. 

FIGURE 25. Vector for expression of mutant (R170E) p-airestin2 as a 
fiision with p-galAu. The open reading frame of p-arrestin2 is engineered to 
contain a point mutation that converts arginine 170 to a glutamate. The mutant P- 
aiTestin2 is cloned in fi^e with P-galAw in a pICAST OMC vector. The pICAST 

1 0 OMC vector contains the following features: MCS, multiple cloning site for 
cloning the target protein in fi^e with the p-galAa; GS Linker, (GGGGS)n; 
Hygro, hygromycin resistance gene; IRES, internal ribosome entry site; ColElori, 
origin of replication for grovHh in E. coli; 5'MoMuLV LTR and 3'MoMuLV LTR, 
viral promotor and polyadenylation signals from the Moloney Murine leukemia 

15 virus. 

FIGURE 26. Phosphorylation insensitive Mutant R170E p-Arrestin2Aco 
binds to p2ARAa in Response to Agonist Activation. A parental p2ARAa C2 cell 
line was tranduced with the Mutant R170E P-Arrestin2Ao) construct. Clonal 
populations co-expressing the two constructions were plated at 10,000 cells/well in 
20 96 well plates and treated v^th 1 O^M ^isoproterenol, 0,3mM ascorbic acid for the 
indicated time period. P-galactosidase activity was measured by addition of Tropix 
Gal-Screen™ assay system substrate (Applied Biosystems) and luminescence was 
measured using a Tropix TR717™ luminometer (Applied Biosystems). Treatments 
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were perfoimed in triplicate. For comparison, a clonal cell line (43-8) co- 
expressing p2ARAa and wild-type (^Anrestin2Aa) was also plated at 10,000 
cells/well and given the same agonist treatment regimen. Minutes of 
(-)isoproterenol treatment is shown on the X-axis and P-galactosidase activity 
5 indicated by relative light units (RLU) is shown on the Y-axis. 

FIGURE 27. GPCR dimerization measured by P-galactosidase 
complementation. A schematic depicting the utilization of the invention for 
monitoring GPCR homo- or hetero- dimerization. One GPCR is fiised to one 
complement enzyme fragment, while the second GPCR is fused to the second 
10 complement enzyme fragment. Interaction of the two GPCRs is monitored by 

complementation of the enzyme fragments to produce an active enzyme complex 
(ix.^ P-galactosidase activity). GPCR homo- or hetero- dimerization can be 
monitored in the absence or presence of iigand, agonists, inverse agonists or 
antagonists. 

1 5 FIGURE 28. Ligand fishing for orphan receptors by p-galactosidase 

mutant complementation in ICAST™ system. A schematic depicting the 
utilization of the invention for ligand fishing and agonist/antagonist screening for 
orphan GPCRs. As an example, a test cell expressing two p-gal fusion proteins, 
GPCRg^Aan-^ci and Arrestin-Aco, is subjected to treatments with samples from 

20 natuml or synthetic compound libraries, or from tissue extracts, or from 

conditioned media of cultured cells. An increased P-gal activity after treatment 
indicates the activation of the orphan receptor by a ligand in the testing sample. 
The readout of increased P-gal activity reflects the interaction of ah activated 
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GPCR orphan recq)tor with a P-airestin. Therefore, a cognate or a surrogate ligand 
for the testing receptor is identified. 

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS 
5 The present invention provides a method to interrogate GPCR function and 

pathways. The G-protein-coupled superfan^ly continues to expand rapidly as new 
receptors are discovered through automated sequencing of cDNA libraries or 
genomic DNA. It is estimated that several thousand GPCRs may exist in the 
human genome. Only a portion have been cloned and even fewer have been 

1 0 associated with ligands. The means by which these, or newly discovered orphan • 
receptors, will be associated with their cognate ligands and physiological functions 
represents a major challenge to biological and biomedical research. The . 
identification of an orphan receptor generally requires an individualized assay and 
a guess as to its function. The present invention involves the interrogation of 

1 5 GPCR function by monitoring the activation of the receptor using activation 

dependent protein-protein interactions between the test GPCR or orphan receptor 
and a P-arrestin. The specific protein-protein interactions are measured using the 
mutant enzyme complementation technology disclosed herein. This assay system 
eliminates the prerequisite guessing because it can be performed with and without 

20 prior knowledge of other signaling events. It is sensitive, rapid and easily 

perfonmed and is applicable to nearly all GPCRs because the majority of these 
receptors desensitize by a conmion mechanism. 

The present invention provides a complete assay system for monitoring 
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protein-protein interactions in GPCR pathways. The invention employs the 
complementation technology, ICASTtm (Intercistronic Complementation Analysis 
Screening Technology as disclosed in pending U.S. patent application serial no. 
053,614, filed April 1, 1998, the entire contents of which are incorporated herein 
5 by reference). The ICAST^*^ technology involves the use of two mutant forms of a 
reporter enzyme fused to proteins of interest. When the proteins of interest do not 
interact, the reporter enzyme remains inactive. When the proteins of interest do 
interact, the reporter enzyme mutants come together and form an active enzyme. 
According to an embodiment of the invention, the activity of P-galactosidase may 

10 be detected with the Gal-Screen™ assay system developed by Advanced Discovery 
Sciences™, which involves the use of Galacton-Siar®, an ultrasensitive 
chemiluminescent substrate. The Gal-Screen™ assay system and the Galacton- 
Star® chemiluminescent substrate are disclosed in U.S. Patent Nos. 5,851,771; 
5,538.847; 5,326,882; 5,145,772; 4.978,614; and 4,931,569, the contents of which 

15 are incorporated herein by reference in their entirety. The invention provides an 
array of assays, including GPCR binding assays, that can be achieved directly 
within the cellular environment in a rapid, non-radioactive assay format The 
methods of the invention are an advancement over the invention disclosed in U.S. 
Patent Nos, 5,891,646 and 6,110,693 and the method disclosed in Angers et al.. 

20 supra., which rely on microscopic imaging or spectrometry of GPCR components 
as fusion with Green-fluorescent-protein. The imaging technique disclosed in U.S. 
Patent Nos. 5,891,646 and 6,1 10,693 and spectrometry-based technique in Angers 
et al. are limited by low-throughput and lack of thorough quantification. 
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The assay system of the invention combined with Advanced Discovery 
Sciences™ technologies provide highly sensitive cell-based methods for 
interrogating GPCR pathways which are amenable to high-throughput screening 
(HTS). Among some of the technologies developed by Advanced Discovery 
5 Sciences™ that may be used with the present invention are the Gal-Screen™ assay 
system (discussed above) and the cAMP-Screen™ immunoassay system. The 
cAMP-Screen™ immunoassay system provides ultrasensitive determination of 
cAMP levels in cell lysates. The cAMP-Screen™ assay utilizes the high-sensitivity 
chemiluminescent alkaline phosphatase (AP) substrate CSPD® (disodium 3-(4- 
10 methoxyspiro {i;2-dioxetane-3,2*-(5'-chloro) tricyclo 3.3.1. l^^}decan-4-yl phenyl 
phosphate) with Sapphire-II™ limainescence enhancer. 

Unlike yeast-based-two-hybrid assays used to monitor protein/protein 
interactions in high-throughput assays, the present invention (1) is applicable to a 
variety of cells including mammalian cells, plant cells, protozoa cells such as E. 
15 coli and cells of invertebrate origin such as yeast, slime mold (Dictyostelium) and 
insects; (2) detects interactions at the membrane at the site of the receptor target or 
in the cytosol at the site of downstream target proteins rather than a limited cellular 
localization, i.e., nucleus; and (3) does not rely on indirect read-outs such as 
transcriptional activation. The present invention thus provides assays with greater 
20 physiological relevance and fewer false positives. 

The present inventors have developed modifications to the embodiment 
disclosed in U.S. patent application serial no. 053,614 described above in order to 
enhance the sensitivity of the inventive GPCR assay. According to an 
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embodiment, the invention incorporates the use of serine/threonine clusters to 
enhance and prolong the interaction of GPCR with arrestin in order to make the 
detection more robust. The clusters can be utilized for orphan receptors or known 
GPCRs, which do not have this sequence motif. By adding this sequence to the C- 
5 terminal tail of the receptor, the activation of the receptor can be detected more 
readily by readouts of arrestin binding to GPCR, Le^, P-galactosidase 
complementation from fusion proteins of target proteins with P-galactosidase 
mutants. 

According to another embodiment, the invention incorporates the use of 
10 arrestin point mutations to bypass the requirement of phosphorylation, by the 

action of specific GRJC, on the C-terminal tail or intracellular loops of GPCR upon 
activation. The applications include i) wherein the cognate GRK for a particular 
GPCR or orphan receptor is unknovra; and ii) wherein the specific GRK for the 
receptor of interest (or under test) may not be present or may have low activity in 
15 the host cell that is used for receptor activation assay. 

According to another embodiment, the invention incorporates the use of a 
super arrestm to increase the binding efficiency of arrestin to an activated GPCR 
and to stabilize the GPCR/arrestin complex during GPCR desensitization. This 
application can be used to increase the robustness of ICAST/GPCR applications in 
20 cases where the GPCR is normally resensitized rapidly post desensitization. 
Each of these methodologies is discussed below. 
The invention v^ll now be described in the following non-limiting 
examples. 
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EXAMPLE; 

According to an embodiment of the invention, GPCR activation is 
measured through monitoring the binding of arrestin to ligand-activated GPCR. In 
this assay system, a GPCR, e^ , P-adrenergic receptor (P2AR), and an arrestin, 
5 e.g., p-arrestin, are co-expressed in the same cell as fusion proteins with mutant 
forms of a reporter enzyme, e.g.. P-galactosidase (P-gal). As illustrated in Figure 
23, the p2AR is expressed as a fusion protein with Aa form of p-gal mutant 
(P2ARAa) and the p-arrestin as a fusion protein with the Aco form of p-gal mutant 
(P-ArrAco), The two fusion proteins, which at first exist in a resting (or un- 

10 stimulated) cell in separate compartments, the membrane for GPCR and the 
cytosol for arrestin, cannot form an active P-galactosidase enzyme. When such a 
cell is treated with an agonist or a ligand, the ligand-occupied and activated 
receptor becomes a high affinity binding site for arrestin. The interaction between 
an activated GPCR, P2ARAa, and arrestin, p-ArrAcD, drives the p-gal mutant 

1 5 complementation. The enzyme activity can be measured by using an enzyme 
substrate, which upon cleavage releases a product measurable by colorimetry, 
fluorescence, or chemiluminescence (e.g.. the Gal-Screen™ assay system). 

Experiment protocol- 

20 1 . In the first step, the expression vectors for P2ARAa and pArr2A(D were 

engineered in selectable retroviral vectors pICAST ALC, as described in Figure 18 
andpICAST OMC, as described in Figure 15. 
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2. In the second step, the two expression constructs were transduced into 
either C2C12 myoblast cells, or other mammalian cell lines, such as COS-7, CHO, 
A431, HEK 293, and CHW, Following selection with antibiotic drugs, stable 
clones expressing both fusion proteins at appropriate levels were selected. 

5 3. In the last step, the cells expressing both p2ARAa and pAn2A(o were 

tested for response by agonist/ligand stimulated P-galactosidase activity. Triplicate 
samples of cells were platpd at 10,000 cells in 100 microliter volume into a well of 
96-well culture plate. Cells were cultured for 24 hours before assay. For agonist 
assay (Figures 3 and 4), cells were treated with variable concentrations of agonist, 

1 0 for example, (-) isoproterenol, procaterol, dobutamine, terbutaline or L-L- 

phenylephrine for 60 min at 37° C. The induced p- galactosidase activity was 
measured by addition of Tropix Gal-Screen™ assay system substrate (Applied 
Biosystems) and luminescence measured in a Tropix TR717™ luminometer 
(Applied Biosystems). For antagonist assay (Figure 5), cells were pre-incubated for 

15 10 min in fresh medium without serum in the presence of ICI-1 18,551 or 
propranolol followed by addition of 10 micro molar (-) isoproterenol. 

Serine/Threon ine Cluster Strateav ' 
Backjground 

20 Based on.structure-function relationship studies on p-arrestins, a large 

region within the amino-terminal half of P-arrestins (termed the activation- 
recognition domain) recognizes the agonist-activated state of GPCRs. This region 
of p-arrestm also contains a small positively charged domain (approximately 20 
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amino acids with net charge +7) called the phosphorylation-recognition domain, 
which appears to interact with the GRK-phosphorylated carboxyl termini of 
GPCRs. 

GPCRs can be divided into two classes based on their affinities for P- 
5 arrestins. Oaklev et ah. "Association of P-Arrestin with G Protem-Coupled 

Receptors During Clathrin-Mediated Endocytosis Dictates the Profile of Receptor 
Resensitization." J. Biol. Chem., 274(45):32248-32257 (1999). The molecular 
determinants underlying this classification appear to reside in specific serine or 
threonine residues located in the carboxyl-teraiinal tail of the receptor. The 

10 receptor class that contains serine/threonine clusters (defined as serine or threonine 
residues occupying three consecutive or three out of four positions) in the 
carboxyl-termini binds p-arrestin with high affinity upon activation and 
phosphorylation and remains bound with P-arrestin even after receptor 
internalization, whereas the receptor class that contains only scattered serine and 

1 5 threonine residues in the carboxy-terminal tail binds p-arrestins vwfh less affinity 
and disassociates from the p-arrestin upon intemalizatioiL Several known GPCRs, 
such as vasopressin V2 receptor ( Oaklev. et dUX neurotensin receptor 1 and 
angiotensin II receptor type lA (Zhang. etal» "Cellular Trafficking of G Protein- 
Coupled Receptor/p-Arrestin Endocytic Complexes." J. Biol. Chem., 

20 274(16):10999^11006 (1999)), which possess one or more of such serme/threonine 
clusters in their carboxyl-termini, were shown to bind P-arrestins with high 
affinity. 
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EXAMPLE 

According to an embodiment of the invention, a serine/threonine cluster 
strategy is used to facilitate screening assays for orphan receptors that do not 
possess this structural motif of their own. The orphan receptors are easily classified 
5 by sequence alignment. Orphan receptors lacking the serine/threonine clusters are 
each cloned into an expression vector that is modified to introduce one or more 
serine/threonine cluster(s) to the carboxyl-terminal tail of the receptor (FIGURE 
24). The serine/threonine clusters enhance the receptor activation dependent 
interaction between the activated and phosphorylated receptor (negative charges) 
10 and p-arrestin (positive charges in the phosphorylation-recognition domain) 

throiigh strong ionic interactions, thus prolonging interaction between the receptor 
and airestin. The modification of the orphan receptor tail thus makes detection of 
receptor activation more robust. 

Experiment protocol - 

1. In a fu-st step, the open-reading-frame (ORF) of an orphan receptor, 
which lacks the serine/threonine clusters, is cloned into a modified expression 
vector such as pICAST ALC described in Figure lOA. The modified pICAST ALC 
includes coding sequences for one or more sets of serine/threonine clusters (for 
example, SSS or SST) located downstream fi-om the insert of the ORF of an orphan 
receptor (FIGURE 24). 

2. In a second step, chimeric orphan receptor, ORFo,phanR-(SSS)n-Aa, is co- 



15 



20 
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expressed in a mammalian cell with a P-arrestin chimera, such as 
PATr2Aco described in Figure 15. 

3. In a third step, the cell is treated with an agonist or a ligand and the 
activated receptor with phosphorylated serine cluster(s) binds the p-airestin with 
5 high affinity producing strong signals in readouts of P-gal complementation. 

This assay, which provides a means for sensitive measxirement of functional 
activation of the orphan receptors, can be used to screen for natural or surrogate 
ligands for orphan receptors, a process called de-orphaning or target discovery for 
new GPCRs (FIGURE 28). Furthermore, this assay is also useful in screening for 
10 potential agonists and antagonists for lead discovery of GPCRs. 

Enhanced Binding of Airestin in the Presence and in the Absence of GPCR 

Phosphorylation 

Background 

15 Six different classes of G-protein coupled receptor kinases (GRKs) have 

been identified and each of these has been reported to be expressed as multiple 
splice variants. Krunnick et al.. "The role of receptor kinases and arrestins in G 
protein-coupled receptor regulation." Ann, Rev. Pharmacol. Toxicol., 38:289-319 
(1998). Although many cell lines express a variety of GRKs, the specific GRK 

20 required for phosphorylation of a given GPCR may not always be present in the 
cell line used for recombinant GPCR and airestin expression. This is particularly 
an issue for applications using orphan receptors, in which case the cognate GRK 
will likely be unknown. In other cases, the cell line used for recombinant 

-34- 



wo 01/58923 



PCT/USOl/00684 



expression woric may have the required GRK, but may express the GRK at low 
levels. In order to bypass such caveats, genetically modified arrestms that bind 
specifically to activated GPCRs, but without the requirement of GRK 
phosphoiylation are employed. 
5 Mutagenesis studies on arrestins demonstrate that point mutations in the 

phosphorylation-recognition domain, particularly mutations converting Argl75 (of 
visual arrestin) to an oppositely charged residue such as glutamate (R175E 
mutation), result in an arrestin which specifically binds to activated GPCRs, but 
does so without the requirement for phosphorylation. 

1 0 Numerous observations have led to the hypothesis that arrestin exists in an 

inactive state that has a low affinity for GPCRs. Once a GPCR is both activated 
and phosphorylated, the phosphorylated region of the GPCR C-terminus interacts 
with the phosphorylation-recognition domain of arrestin causing the arrestin to 
change conformations allowing the activation-recognition region to be exposed for 

1 5 binding to the activated/ phosphorylated receptor. Vishnivetskiv et al., 'How does 
airestin respond to the phosphorylated state of rhodopsin?" J, Biol. Chem., 
274(1 7):11451-11454 (1999); Gurevich et al.. "Airestin mteractions with G 
protein-coupled receptors. Direct binding studies of wild-type and mutant arrestins 
with rhodopsin, beta 2-adrenergic and m2 muscarinic cholinergic receptors." J. 

20 Biol. Chem., 270(2):720-731, (1995); Gurevich et al.. "Mechanism of 

phosphorylation-recognition by visual arrestin and the transition of arrestin into a 
high afiSnity binding site." MoL Pharmacol., 5 1(1): 161-1 69 (1997); Kovoor et al.. 
'Targeted construction of phosphoiylation-independent beta-arrestin mutants v/ith 
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constitutive activity in cells." J. Biol. Chem.. 274(1 1):6831-6834 (1999). In 
summary, binding studies of single mutation, double mutation, deletion, and 
chimerical arrestins with inactive, inactive and phosphoiylated, activated but not 
phosphorylated, or activated and phosphoiylated visual or non-visual GPCRs all 
5 support this model. 

A phosphorylation insensitive mutant of arrestin fused to mutant reporter 
protein can be produced that will bind to activated GPCRs in a phosphorylation 
10 independent manner. Asproofof concept, a point mutation for P-airestin2,R170E 
P-arrestin2, has been produced and its interaction with p2AR has been analyzed in 
accordance with the invention. 

Experimental protocol: 

15 1) In the first st«p, P-arrestin2 was mutated such that Argl70 was converted to 
Glu. This mutation is equivalent to the R175E mutation of visual airestin. The 
mutant P-aTTestin2 open reading frame was cloned in frame with Aco-P- 
galactosidase in the pICAST OMC expression vector to produce a modified 
expression vector Rl 70E P-aTTestin2 (FIGURE 25). 

20 2) In the second step, the Rl 70E P-arrestin2 expression construct was 

transduced into a C2C12 myoblast cell line that had been engineered to express 
p2AR as a fiision to Aa-P-galactosidase as described in Figure 1 8 of U.S. 
Application Serial No. 09/654,499. Following selection with antibiotic drugs, a 
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population of clones expressing both fusion proteins was obtained. 
3) In the last step, this population of cells expressing both R170E p- 
arrestin2A(i) and p2ARAa were tested for response by agonist/ligand stimulated p- 
galactosidase activity as demonstrated in FIGURE 26. The C2C12 clone 43-8 cor 
5 expressing p2ARAa and wild-type P-aiTestin2Aw (FIGURE 26) was used as 

reference control. Triplicate samples of cells were plated at 10,000 cells in 100 
microliter volume into wells of a 96-well culture plate. Cells were cultured for 24 
hours before assay. For agonist assay as in FIGURE 26, cells were treated with 
10tim (-)isoproterenol stabilized with 0.3mM ascorbic acid 37° C for 0, 5, 10, 15, 

10 30, 45 or 60 minutes. The induced p-galactosidase activity was measured by 

addition of Tropix Gal-Screen™ assay system substrate (Applied Biosystems) and 
luminescence measured in a Tropix TR?!?""^ luminometer (Applied Biosystems). 
As shown in Figure 26, the mutant airestin interacts with P2AR in an agonist- 
dependent manner and was comparable with that of wild-type an^estin. 

15 4) To expand the application of phosphorylation-insensitive airestin, cell lines 
such as C2C12, CHO or HEK 293, are developed that express the R170E p- 
airestin2A(i> construction. These cell lines can be used to transduce orphan or 
known GPCRs as fusions with Aa-P-galactosidase in order to develop cell lines for 
agonist and antagonist screening and 
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Development of Super Arrestips: 
Background 

Attenuation of GPCR signaling by the arrestin pathway serves to ensure 
that a cell or organism does not over-react to a stimulus. At the same time, the 
5 arrestin pathway often serves to recycle the GPCR such that it can be temporarily 
inactivated but then quickly resensitized to allow for sensitivity to new stimuli. 
The down-regulalion process involves phosphorylation of the receptor, binding to 
arrestin and endocytosis. Following endocytosis of the desensitized receptor, the 
receptor is either degraded in lysosomes or resensitized and sent back to the 

1 0 membrane. Resensitization involves release of arrestin from the receptor, 

dephosphorylation and cycling back to the membrane. The actual route a GPCR 
follows upon activation depends on its biological function and the needs of the 
organism. Because of these diverse pathways that may be required of the down- 
regulation pathway, arrestin affinities for activated GPCRs vary from receptor to 

15 receptor. It would thus be very advantageous to engineer super airestins that have 
a higher affinity and avidity for activated GPCRs than what nature has provided. 

Although mutational, deletion and chimerical studies of airestins have 
focused on understanding regulatory switches in the molecule that respond to 
GPCR phosphorylation states, several of these altered recombinant forms of 

20 arrestin have resulted in molecules with enhanced binding to activated, 
phosphorylated GPCRs. Conversion of Argl75 to histidine, tyrosine, 
phenylalanme or threonine results in signiiScantly higher amounts of binding to 
phosphorylated, activated rhodopsin than wild-type arrestin or R175B arrestin, 
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although these mutations result in less binding to activated, non-phosphoiylated 
receptor. Gurevich et al. (1997), In addition, conversion of Valine 170 to alanine 
increased the constitutive affect of the R175E mutation, but also nearly doubled the 
amount of interaction of wild-type arrestin with activated, phosphorylated 
5 rhodopsin. Gurevich et al. fl997\ 

Truncation of P-arrestinl at amino acid 382 has been reported to enhance 
binding of both Rl 69E (equivalent to arrestin R175E) and wild-type p-arrestinl to 
activated or activated and phosphorylated receptor, respectively, Kovoor et al. 
Chimerical arrestins in which functional regions of visual arrestin were swapped 

1 0 with those of P-arrestinl have been reported to be altered in binding affinity to 
activated, phosphorylated GPCRs. Gurevich etal (1995^ Several of these 
chimeras, such as P-arrestinl containing the visual arrestin extreme N-terminus, 
show increased specific binding to phosphorylated activated GPCRs compared to 
wild-type p-arrestinl (Gurevich et al. (1995)). Modifications that enhance arrestin 

1 5 affinity for the activated GPCR such as described above, whether phosphorylated 
or non-phosphorylated, could also enhance signal to noise of P-galactosidase 
activity since the airestin/GPCR complex is stabilized and/or more long-lived. The 
use of mutant arrestms with higher activated-GPCR affinity would improve the 
inventive technology for GPCR targets, without compromising receptorAigand 

20 biology. 

In addition, this "super arrestin" approach can be combined with the use of 
arrestin point mutations to provide a stronger signal to noise with or without GRK 
requirements. 
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EXAMPLE 

An arrestin mutant fused to mutant reporter protein can be produced to 
enhance binding of the arrestin to an activated GPCR to enhance sensitivity of 
detection. 
5 Experiment protocol - 

1) In the fu'st step, mutant P-arrestin2 constructions will be generated which 
include R170E/T/Y/or H, V165A, substitution of a.a. l-43with a.a. 1-47 of visual 
arrestin, or deletion of the C-terminal and combinations of these alterations. The 
mutant p-arrestin2 open reading frames will be cloned in frame with A(i)*P- 

1 0 galactosidase in the pICAST OMC expression vector similar to cloning of the 
R 1 70E P-aiTestin2 mutation shown in FIGURE 25 . 

2) In the second step, mutant expression constructs will be transduced into a 
C2C12 myoblast cell line that has been engineered to express P2AR as a fusion to 
Aa-P-galactosidase. Following selection with antibiotic drugs, a population of 

1 5 clones expressmg both fusion proteins will be obtained. Wild type and Rl 70E p- 
arrestin2 constructions will be transduced to generate control, reference clonal 
populations. 

3) In the third step, populations of cells expressing both p-arTestin2A(o (mutant 
or wild type) and p2ARAa will be tested for response by agonist/ligand stimulated 

20 P-galactosidase activity. 

4) In the next step, mutant (super) p-arrestin2A<i) constructions that show a 
significantly higher signal to noise ratio in the agonist assay compared with wild- 
type p-arrestin2Ao> will be chosen. These constructions will be used to develop 
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Stable cell lines expressing the "super" p-arrestin2Ao) that can be used for 
transducing in known or orphan GPCRs, Use of a super P-arTestin2A<i) could 
increase the signal to noise of ICAST/GPCR applications allowing improved 
screening capabilities for lead and ligand discovery. 

5 Super Arrestin is used to increase the binding efficiency of arrestin to an 

activated GPCR and to stabilize the GPCR/airestin complex during GPCR 
desensitization. This application can be used to increase the robustness of 
ICAST/GPCR appUcations in cases where the GPCR is normally resensitized . 
rapidly post desensitization. 

10 The assays of this invention, and their application and preparation have 

been described both generically, and by specific example. The examples are not 
intended as limiting. Other substituent identities, characteristics and assays will 
occur to those of ordinary skill in the art, without the exercise of inventive faculty. 
Such modifications remain within the scope of the invention, unless excluded by 

15 the express recitation of the claims advanced below. 
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WHAT IS CLAIMED IS: 

1 . A method of assessing the effect of a test condition on G-protein- 
coupled receptor (GPCR) pathway activity, comprising: 

a) providing a cell that expresses a GPCR as a fusion protein to one mutant 
5 form of reporter enzyme and an interacting protein partner as a fusion to another 

mutant fonn of enzyme, 

wherein said cell also expresses an arrestin, wherein said arrestin is 
modified to enhance binding of said arrestin to said GPCR, wherein said enhanced 
binding between said arrestin and said GPCR increases sensitivity of detection of 
1 0 said effect of said test condition; 

b) exposing the cell to a ligand for said GPCR under said test condition; and 

c) monitoring activation of said GPCR by complementation of said reporter 
enzyme; 

wherein increased reporter enzyme activity in the cell compared to that 
1 5 which occurs in the absence of said test condition indicates increased GPCR 

interaction with its interacting protem partner compared to that which occurs in the 
absence of said test condition, and decreased reporter enzyme activity in the cell 
compared to that which occurs in the absence of said test condition indicates 
decreased GPCR interaction with its interacting protein partner compared to that 
20 which occurs in the absence of said test condition. 

2. A method of assessing the effect of a test condition on G-protein- 
coupled receptor (GPCR) pathway activity, comprising: 

a) providing a cell that expresses a GPCR as a fusion protein to one mutant 
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form of reporter enzyme and an interacting protein partner as a fusion to another 
mutant form of enzyme; 

wherein said GPCR fusion protein is modified to include one or more sets 
of serine/threonine clusters, wherein said one or more sets of serine/threonine 
5 clusters enhance binding of said GPCR to anestin, wherein said enhanced binding 
between said GPCR and said arrestin increases sensitivity of detection of said 
effect of said test condition; 

b) exposing the cell to a Ugand for said GPCR under said test condition; and 

c) monitoring activation of said GPCR by complementation of said reporter 
10 enzyme; 

wherein increased reporter enzyme activity in the cell compared to that 
which occurs in the absence of said test condition indicates increased GPCR 
interaction with said interacting protein partner compared to that which occurs in 
the absence of said test condition, and decreased reporter enzyme activity in the 
15 cell compared to that which occurs in the absence of said test condition indicates 
decreased GPCR interaction with interacting protein partner compared to that 
which occurs in the absence of said test condition. 

3. A DNA molecule comprising a sequence encoding a biologically active 
hybrid GPCR, wherein said hybrid GPCR comprises a GPCR as a fusion protein to 

20 one mutant form of reporter enzyme and wherein said hybrid GPCR is modified to 
include one or more sets of serine/threonine clusters, wherein said one or more sets 
of serine/threonine clusters enhance binding of said hybrid GPCR to arrestin. 

4, A DNA construct capable of directing the expression of a biologically 
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active hybrid GPCR in a cell, comprising the following operatively linked 
elements: 

a promoter; and 

a DNA molecule comprising a sequence encoding a biologically active 

s 

5 hybrid GPCR, wherein said hybrid GPCR comprises a GPCR as a fusion protein to 
one mutant form of reporter enzyme and wherein said hybrid GPCR is modified to 
include one or more sets of serine/threonine clusters, wherein said one or more sets 
of serine/threonine clusters enhance binding of said hybrid GPCR to arrestin. 

5. A cell transforaied with a DNA construct capable of expressing a 

1 0 biologically active hybrid GPCR in a cell, comprising the following operatively 
linked elements: 

a promoter; and 

a DNA molecule comprising a sequence encodmg a biologically active 
hybrid GPCR, wherem said hybrid GPCR comprises a GPCR as a fusion protein to 
1 5 one mutant form of reporter enzyme and wherein said hybrid GPCR is modified to 
include one or more sets of serine/threonine clusters, wherein said one or more sets 
of serine/threonine clusters enhance binding of said hybrid GPCR to arrestin, 

6. A DNA molecule comprising a sequence encoding a biologically active 
hybrid arrestin, wherein said hybrid arrestin comprises an arrestin as a fiision to 

20 one mutant form of reporter enzyme and wherein said hybrid arrestin is modified 
to enhance binding of said arrestin to GPCR. 

7. A DNA constmct capable of directing the expression of a biologically 
active hybrid arrestin in a cell, comprising the following operatively linked 
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elements: 

a promoter; and 

aDNA molecule comprising a sequence encoding a biologically active 
hybrid arrestin, wherein said hybrid aixestin comprises an airestin as a fusion to 
5 one mutant fonn of reporter enzyme and wherein said hybrid arrestin is modified 
to enhance binding of said arrestin to GPCR. 

8. A cell transformed with aDNA construct capable of expressing a 
biologically active hybrid arrestin in a cell, comprising the following operatively 
linked elements: 
10 a promoter; and 

aDNA molecule comprising a sequence encoding a biologically active 
hybrid arrestin, wherein said hybrid arrestin comprises an arrestin as a fusion to 
one mutant fonn of reporter enzyme and wherein said hybrid arrestin is modified 
to enhance binding of said arrestin to GPCR. 
15 9. A method of assessing the effect of a test condition on G-protein- 

coupled receptor (GPCR) pathway activity, comprising: 

a) providing a cell that expresses a GPCR as a fusion protein to one mutant 
form of reporter enzyme and an interacting protein partner as a fusion to another 
mutant form of enzyme, 
20 wherein said cell also expresses an arrestin, wherein said arres.tin is 

modified by introducing a point mutation in a phosphorylation-recognition domain 
to remove a requirement for phosphorylation of said GPCR for arrestin binding to 
permit binding of said arrestin to said GPCR in said cell regardless of whether said 
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GPCR is phosphorylated, 

b) exposing the cell to a ligand for said GPCR under said test condition; and 

c) monitoring activation of said GPCR by complementation of said reporter 
enzyme; 

5 wherein increased reporter enzyme activity in the cell compared to that 

which occurs in the absence of said test condition indicates increased GPCR 
interaction with its interacting protein partner compared to that which occurs in the 
absence of said test condition, and decreased reporter enzyme activity in the cell 
compared to that which occurs in the absence of said test condition indicates 
10 decreased GPCR interaction with its interacting protein partner compared to that 
which occurs in the absence of said test condition. 

10. The method of Claim 9, wherein said arrestm is mutated to increase a 
property selected from affmity and avidity for activated, non-phosphorylated 
GPCR. 

15 11. The method of Claim 10, wherein said airestin is P-arrestin2 and 

wherein said p-arrestin2 is mutated to convert Argl69 to an oppositely charged 
residue. 

12. The method of Claim 1 1 , wherein said oppositely charged residue is 
selected from the group consisting of histidine, tyrosine, phenylalanine and 

20 threonine. 

13. The method of Claim 9, wherein said arrestin is mutated to increase a 
property selected from affinity and avidity for activated and phosphorylated GPCR. 

14. A method of assessing the effect of a test condition on G-protein- 
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coupled receptor (GPCR) pathway activity, comprising: 

a) providing a cell that expresses a GPCR as a fusion protein to one mutant 
form of reporter enzyme and an interacting protein partner as a fusion to another 
mutant form of enzyme; 

5 wherein said GPCR fusion protein is modified to include one or more sets 

of serine/threonine clusters, said one or more serine/threonine clusters defined as 
serine or threonine residues occupying three consecutive or three out of four 
positions in a carboxyl-tennini of said GPCR, wherein said one or more sets of 
serine/threonine clusters enhance binding of said GPCR to arrestin, wherein said 
10 enhanced binding between said GPCR and said arrestin increases sensitivity of 
detection of said effect of said test condition; 

b) exposing the ceU to a ligand for said GPCR under said test condition; and 

c) monitoring activation of said GPCR by complementation of said reporter 
enzyme; 

15 wherein increased reporter enzyme activity in the cell compared to that 

which occurs in the absence of said test condition indicates increased GPCR 
interaction with said interacting protein partner compared to that which occurs in 
the absence of said test condition, and decreased reporter enzyme activity in the 
cell compared to that which occurs in the absence of said test condition indicates 

20 decreased GPCR interaction with interacting protein partner compared to that 
which occiu-s in the absence of said test condition. 

15. The method of Claim 1, wherein said modified arrestin exhibits 
enhanced binding to activated, phosphorylated GPCR. 
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25. The method of Claim 14, wherein said modified arrestin comprises 
conversion of Argl70 to an amino acid selected jfrom the group consisting of 
histidine, tyrosine, phenylalanine and threonine. 
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Cellular Expression of p2AR-pgalAa Fusion. Protein in C2 Clones 
(measured by anti-(3-gal ELISA) 
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FIGURE 1A 
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Cellular expression of (3Arr2-pgalAco fusion protein in C2 clones 
(measured by anti-p gal ELISA) 
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FIGURE 1B 
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Agonist Stimulated cAMP Response in C2 Cells Expressing p2AR-pgalAa 
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FIGURE 2 
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p-galactosidase Complementation as a Measurement for p2AR-pgialAa 
interacting with pArrestin2-pgalAcD upon agonist Stimulation 
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p-galactosidase Complementation as a Measurement for p2AR-pgalAa 
Interaction with pArrestinl-pgalAco upon Agonist Stimulation 
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(3-galactosidase Activity in Response to Agonist in C2 Cells 
Coexpressing p2AR-PgalAa and pArrestin2-|3galAco Fusion Proteins 
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p-galactosidase Activity in Response to Agonist in C2 Cells 
Coexpressing p2AR-(3galAa and pArrestinl-pgalAco Fusion Proteins 
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Inhibition of p-galactosidase activity in C2 Cells Coexpressing 
|32AR-pgalAa and pArrestin2-pgalA(D Fusion Proteins 
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FIGURE 5A 
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Antagonist Inhibition of p-galactosidase Activity in C2 Cells 
Coexpressing (32AR-pgalAa and pArrestinl-pgalAco Fusion Proteins 
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Figure 5B 
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Agonist Stimulated cAMP Response in Clones or Pools of C2 Cells 
Coexpressing A2aR-[3galAa and pArrestinl-pgalAco Fusion Proteins 
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Agonist Stimulated cAIVlP Response in Clones or Pools of C2 Cells 
Expressing D1-pgalAa and pArrestin2-PgalAco Fusion Proteins 
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pjAR-pgalAco and parr2-pgalAa Interaction in HEK293 
Clones in Response to Isoproterenol Treatment (1 ^iWI) 
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p2AR-pgalAa and pArr1-pgalA cx> int6raction in a cho pooi 
in Response to Isoproterenol Treatment(10uM) 
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FIGURE 8B 
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p2AR-PgalAa and pArr2-pgalAco Interaction in CHW Clone 
in Response to isoproterenol Treatment (10uM) 
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p-galactosidase Complementation as a Measurement for 
Adrenergic Receptor Homodimerization in HEK 293 Cells 

Coexpressing {32AR-pgalAa and p2AR-pgalAco. 
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Agonist Stimulated cAIVIP Response in HEK 293 Cells 
Coexpressing p2AR-pgalAa and (32AR"|3galAco 
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I CTGCAGCCTG AATATGGGCC AAACAGGATA TCTGTGGTAA GCAGTTCCTG 
GACGTCGGAC TTATACCCGG TTTGTCCTAT AGACACCATT CGTCAAGGAC 



51 CCCCGGCTCA GGGCC?.AGAA CACATGGAAC AGCTGAATAT GGGCCAAACA 
GGGGCCGAGT CCCGGTTCTT GTCTACCTTG TCGACTIATA CCCGGTTTGT 



101 GGATATCTGT GGTAAGCAGT TCCTGCCCCG 6CTCAGGGCC AAGAACA6AT 
CCTATAGACA CCATTCGTCA AGGACGGGGC CGAGTCCCG6 TTCTTGTCTA 



151 GGTCCCCAGA TGCGGTCCAG CCCTCAGCAG TTTCTAGAGA ACCATCAGAT 
CCAGGGGTCT ACGCCAGGTC GGGAGTCGTC AAAGATCTCT TGGTAGTCTA 



201 GTTTCCAGGG TGCCCCAAGG ACCTGAAATG ACCCTG7GCC TTATTTGAAC 
CAAAGGTCCC ACGGGGTTCC TGCACTTTAC TGGGACACGG AATAAACTTG 



251 TAACCAATCA GTTCGCTTCT" CGCTTCTGTT CGCGCGCTTC TGCTCCCCGA 
ATTGGTTAGT CAAGCGAAGA GCGAAGACAA GCGCGCGAAG ACGAGGGQCT 



301 GCTCAATAAA AGAGCCCACA ACCCCTCACT CGGGGCGCCA GTCCTCCGAT 
cgagttattt tctcgggtgt tggggag'tga GCCCCGCGGT CAGGAGGCTA 



351 TGACTGAGTC GCCCGCGTAC CCGTGTATCC AATAAACCCT CTTGCAGTTG 

ACTGACTCAG cgggcccatc GGCACATAGG ttatttggga caacgtcaac 



401 catccgactt gtggtctcgc tgttccttgc gagcgtctcc tctgagtgat 
gtaggctgaa caccagagcg acaaggaacc ctcccagagg agactcacta 



4 51 TGACTACCCG TCAGCGGGGG TCTTTCATTT GGGGGCTCGT CCGGGATCGG 
ACTGATGGGC AGTCGCCCCC AGAAAGTAAA CCCCCGAGCA GGCCCTAGCC 



501 GAGACCCCTG CCCAGGGACC ACCGACCCAC CACCGGGAGG caagctggcc 

ctctggggac gggtccctgg tggctgggtg gtggccctcc gttcgaccgg 



551 AG'CKfiCTTAT~CTGTGTCTGT-CCGATTGTC7r AGTGTCTKTG-MTGATTTTA- 
TCGTTGAATA GACACAGACA GGCTAACAGA TCACAGATAC TGACTAAAAT 



601 TGCGCCTGCG TCGGTACTAG TTAGCTMCT AGCTCTGTAT CTGGCGGACC 
ACGCGGACGC AGCCATGATC AATCGATTGA TCGAGACATA GACCCCCTGG 



651 CGTGGTGGAA CTGACGAGTT CTGAACACCC GGCCGCAACC CTGGGAGACG 
GCACCACCrT GACTGCTCAA GACTTGTGGG CCGGCGTTGG GACCCTCTGC 



701 TCCCAGGGAC TTTGGCGGCC GTTTTTGTGG CCCGACCTGA GGAAGGGAGT 
AGGGTCCCXG.AAACCCCCGG CAAAAACACC GGGCTGGACT CCTTCCCTCR 



751 CGATGTGGAA TCCGACCCCG TCAGGATATG TGGTTCTGGT AGGAGACGAG 
GCTACACCTT AGGCTGGGGC AGTCCXATAC ACCRAGACCA TCCTCTGCTC 



801 AACCTAAAAC AGTTCCCGCC TCCGTCTGAA TTTTTGCTTT CGGTTTGGAA 
TTGGATTTTG TCAAGGGCGG AGGCAGACTT AAAAACGAAA GCCAAACCTT 



B51 CCGAAGCCGC GCGTCTTCTC TGCTGCAGCA TCGTTCTGTG TTGTCTCTGT 
GGCTTCGGCG CGCAGAACAG ACGACGTCGT AGCAAGACAC RACAGAGACA 



901 CTGACTGTGT TTCTGTATTT 6TCTGAAAAT TAGGGCCAGA CTGTTACCAC 
GACTGACACA AAGACATAAA CAGACTTTTTA ATCCCGGTCT GACAATGGTG 
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y-bi TCcrTTAAGT TTriACCTTA'; 'iTAACi^jGAA a'';atgtc<';a'j rxuci'aj'rn; 

AGGGAATTCA AACTGGAATC CATTGACCTT TCTACAGCTC GCCGAGCGAG 



1001 ACAACCAGTC GGTAGATGTC AAGAAGAGAC GTTGGGTTAC CTTCTGCTCT 
TGTTCGTCAG CCATCTACAG TTCTTCTCTG CAACCCAATG GAAGACGAGA 



1051 GCAGAATGGC CAACCTTTAA CGTCGGP.TGG CCGCGAGACG GCACCTTTAA 
' CGTCTTACCG GTTGGAAATT GCAGCCTACC GGCGCTCTGC CGTGGAAATT 



1101 CCGAGACCTC ATCACCCAGG TTAAGATCAA GGTCTTTTCA CCTGGCCCGC 
GGCTCTGGAG TAGTGGGTCC AATTCTAGTT CCAGAAAAGT GGACCGGGCG 



.1151 ATGGACACCC AGACCAGGTC CCCTACRTCG TGACgTGGGA AGCCTTGGCT 
TACCTGTGGG TCTGGTCCAG GGGATGTAGC ACTGGACCCT TCGGAACCGA 



1201 TTTGACCCCC CTCCCTGGGT CAAGCCCTTT GTACACCCTA AGCCTCCGCC 
AAACTGGGGG GAGGGACCCA GTTCGGGAAA CATGTGGGAT TCGGAGGCGG 



1251 TCCTCTTCCT CCATCCGCCC CGTCTCTCCC CCTTGAACCT CCTCGTTCGA 
AGGAGAAGGA GGTAGGCGGG GCAGAGAGGG GGAACTTG3A GGAGCAAGCT 



1301 CCCCGCCTCG ATCCTCCCTT TATCCAGCCC TCACTCCTTC TCTAGGCGCC 
GGGGCGGAGC TAGGAGGGAA ATAGGTCGGG AGTGAGGAAG AGATCCGCGG 



1351 GGCCGCTCTA GCCCATTAAP ACGACTCACT ATAGGGCGAT TCGAATCAGG 
CCGGCGAGAT CGGGTAATTA TGCTGAGTGA TATCCCGCTA AGCTTAGTCC 



1401 CCTTGCCGCG CCGGATCCTT AATFAAGCGC AATTCGGAGG TGGCGGTAGC 
GGAACCGCGC GGCCTAGGAA TTAATTCGCG TTAACCCTCX ACCGCCATCG 



+2 MG VIT D SL AVVA RTD 

] 

1451 CTCGAGATQG GCGTGATTAC GGATTCACTC GCCGTCCTGG CCCGCACCGA 
GAGCTCTACC CGCACTAATG CCTAAGTGAC CGGCAGCACC GGGCGTGGCT 



42 aPS QQLR SLN GEW RFA 



1501 TCGCCCTTCC CAACAGTTAC GCAGCCTGAA TGGCGAATGG CGCTTTGCCT 
AGCGGGAAGG GTTGTCAATG CGTCGGAGTT ACCGCTTflCC GCGAAACGGA 



+2WFPA PEA VPES WLE CDL 



1551 G6TTTCCGGC ACCAGAAGCG GTGCCGGAAA GCTGGCTrGGA GTGCGATCTT 
CCAAAGGCTG- TGGTCTTCGC CACGGCCTTT CGACCGACCT CACGCTAGAA 



+2PEAD TVV VPS NWQM HGY 



1601 CCTGAGGCCG ATACTGTCGT CGTCCCCTCA AACTGGCAGA T6CACGGTTA 
GGACTCCGGC TATGACAGCA GCAGGGGAGT TTGACCGTCT ACGTGCCAAT 



+2 DAP lYTN VTY PIT VNP 



1651 CGATGCGCCC ATCTACACCA ACGTGACCTA TCCCATTACG GTCAATCCGC 
GCTACGCGGG TAGATGTGGT TGCACTGGAT AGGGTAATGC CAGTTAGGCG 
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+ 2PFVP TEN PTGC YSL TFN 



1701 CGTTTGTTCC CACGGAGAAT CCGACGGGTT GTTACTCGCT CACATTTAAt 
GCAAACAAGG GTGCCTCTTA' GGCTGCCCAA CAATGAGCGA GTGTAAATTA 



+2V0ES WLQ EGQ TRII FDG 



1751 GTTGATGAAA GCTGGCTACA GGAAGGCCAG ACGCGAATTA TTTTTGATGG 
CAACTACTTT CGACCGATGT CCTTCCGGTC TGCGCTTAAT AAAAACTACC 



+2 VNS AFHL WCN G.RW VG Y 



1601 CGTTAACTCG GCGTTTCATC TGTGGTGCAA CGGGCGCTGG GTCGGTTACG . 
GCAATTGAGC CGCAAAGTAQ ACACCACGTT GCCCGCGACC CAGCCAATGC 



+2GQDS RLP SEED L SA FLR 



1851 GCCAGGACAG TCGTTTGCCG TCTGAATTTG ACXTGAGCGC ATTTTTACGC 
CGGTCCTGTC AGCAAACGGC AGACTTAAAC TGGACTCGCG TAAAAATGCG 



+ 2 A G EN RLA VMV LRWS OGS 



1901 6CCGGAGAAA ACCGCCTCGC GGTGATGGTG CTGCGCTGGA GTGACGGCAG 
CGGCCTCTTT TGGCGGAGCG CCACTACCAC GACGCGACCT CACTGCCGTC 



+2 YLE DQ DM WRM SGI FRD 

1951 TTATCTGGAA GAT^AGGATA TGTGGCGGAT GAGCGGCATT TTCCGTGACG 
AATAGACCTT CTAGTCCTAT ACACCGCCTA CTCGCCGTAA AAGGCACTGC 



+2V SLL HK P TTQ'i sDF.HVA 



2001 TCTCGTTGCt GCATAAACCG ACTACACAAA TCAGCGATTT CCATGTTCCC 
AGAGCAACGA CGTATTTCGC TGATGTGTTT AGTCGCTAAA GGTACAACGG 



+2 T R r N b O T s'"'r"a V L E A E V Q 



2051 ACTCGCTTTA ATGATGATTT CAGCCGCGCT GTACTGGAGG CTGAAGTTCA 
TGAGCGAAAT TACTACTAAA GTCGGCGCGA CATGACCTCC feACTTCAAGT 



+2 MCG ELRD YLR VTV SLW 



2101 GATGTGCGGC GAGTTGCGTG ACTACCTACG GGTAACAGTT TCTTTATGGC 
CTACACGCCG CTTCAACGCAC 'TGATGGATGC CCATTGTCAA AGAAATACCG 



+2QGBT QVA SGTA PFG GEI- 



2151 AGGGTGAAAC GCAGGTCGCC AGCGGCRCCG CGCCTTTCGG CGGTGAAATT 
TCCCACTTTG OjTCCAGCGG TCGCCGTGGC GCGGAAAGCC GCCACTTTAA 



+2IDER GGY ADR VTLR LNV 



2201 ATCGATGAGC GTGGTGGTTA TGCCGATCGC GTCACACTAC GTCTGAACGT 
TACCTACTCG CACCACCAAT ACGGCTAGCG CAGTGTGATC CAGACTTCCA 



+2 ENP KLWS AEI PNL YRA 



2251 CGAAAACCCG AAACTGTGGA 5CGCCGAAAT CCCGAATCTC TATCGTGCGG 
GCTTTTGGGC TTTGACACCT CGCGGCTTTA G6GCTTAGRG ATAGCACGCC 
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+2VVEL HTA DGTL lEA EAC 



2301 TGGTTGAACT GCACACCGCC GACGGCACGC TGATTGAAGC AGAAGCCTCC ' 
ACCAACTTGA CG'TGTGGCGG CTGCCGTGCG ACTAACTTCG TCTTCGGACG 



+ ^CVGF REV RIE NGIL LLN 



2351 GATGTCGGTT TCCGCGAGGT GCGGArTGAA AATGGTCTGC TGCTGCTGAA 
CTACAGCCAA AGGCGCTCCA CGCCTAACTT TTACCAGACG ACGACGACTT 



+2 GKP L LIR GVH RH2 HHP 



2401 CGGCAAGCCG TTGCTGATTC GAGGCGTTAA CCGTCACGAG CATCATCCTC 
GCCGTTCGGC AACGACTAAG. CTCCGCAATT GGCAGTGCTC GTAGTAGQAG 

+2LHGQ VMD EQTM VQ.- D ILL 

24 51 TGCATGGTCA GGTCATGGAT GAGCAGACGA TGGTGCAGGA TATCCTGCTG 
ACGTACCAGT CCAGTACCTA CTCGTCTGCT ACCACGTCCT ATAGGACGAC 

+ 2MKQ N MFN AV R CSHY PNH 



2501 ATGAAGCAGA ACAACTTTAA CGCCGTGCGC TGTTCGCATT ATCCGAACCA 
TACTTCGTCT TGTTGAAATT GCGGCACGCG ACAAGCGTAA TAGGCTTGGT 



+2 PLW YTLC DRY GLY VVD 



2551 TCCGCTGTGG TACACGCTGT GCGACCGCTA CGGCCTGCAT GTGGTGGATG 
AGGCGACACC ATGTGCGACA CGCTGGCGAT GCCGGACATA CACCACCTAC 



+2EANX, ETH GMV P M NR LTD 



2601 AAGCCAATAT TGAAACCCAC GGCRTGGTGC CAATGAATCG TCTCACCGAT 
TTCGGTTATA ACTTTGGGTG CCGTACCACG GTTACTTAGC AGACTGGCTA 



+2 0 P' R"H" r""P" A "M"SE RVTR MVQ 



2651 GATCCGCGCT 6GCTACCGGC GATGAGCGAA CGCGTAACGC GAATGGTGCA 
CTAGGCGCGA CCGATGGCCG CTACTCGCTT GCGCATTGCG CTTACCACGT 



+2 RDR NHPS VII WSL GNE 



2701 GCGCGATCGT AATCACCCGA GTGTGATCAT CTGGTCGCTG GGGAATGAAT 
C6CGCTAGCA rTAGTGGGCT CACACTAGTA GACCAGCCAC CCCTTACTTA 



+2S .GHG A NH DALY RWl KSV 



2751 CAGGCCACGG CGCTAATCAC GACGCGCTGT ATCGCTGGAT CAAATCTGTC 
GTCCGGTGCC GCGATTAGTG CTGC6CGACA TAGCGACCTA GTTTAGACAG 



+2DPSR PVQ YEG GGAD TTA 



2801 GATCCTTCCC GCCCGGTCCA GTArGAAGGC GGCGGACCCG ACACCACGGC 
CTAGGAAGGG CGGGCCACGT CATACTTCCG CCGCCTCGGC TGTGGTGCCG 



+2 TDI rCP M YAR VDE DQP 



2851 CACCGATATT ATTTGCCCGA TGTACGCGCG CGTGGAXGAA GACCAGCCCT 
GTGGCTATAA TAAACGGGCT ACATGCGCGC GCACCTACTT CTGGTCGGGA 
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+ 2.FPAV PKW SIKK WLS LPC 



2901 TCCCGGCTGT GCCGAAATGG TCCATCAAAA AATGGCTnC GCTACCTGGA 
ftGGGCCGACA CGGCTTTACC AGGTAGTTTT TTACCGAAfiG CGATGGACCT 



+2ETRP LIL CEY AHAN GNS 



2951 GAGACGCGCC CGCTGATCCT TTGCGAATAC GCCCACGCGA TGGGTAACAG 
CTCTGCGCGG GCGACTAGGA AACGCTTATG CGGGTGCGCT ACCCATTGTC 



+ 2 LGG FAKY HQA ? K Q YPR 



3001 TCTTGGCGGT TTCGCTAAAT ACTGGCAGGC GTTTCGTCAG TATCCCCGTT 
AGAACCGCCA AAGCGATTTA TGACCGTCCG CAAAGCAGTC ATACCCGCAA* 



+2LQGC FVW DWVD* QSL IKY 



3051 TACAGGGCGG CTTCGTCTGG GACrGGGTGG ATCAGTCC3CT GATTAAATAT 
ATGTCCCGCC GAAGCAGACC CTGACCCACC TAGTCAGOGA CTAATTTATA 



+2 DENG NPW SAY GGDF GOT 



3101 GATGAAAACG GCAACCCGTG GTCGGCTTAC GGCGGTGATT TTGGCGATAC 
CTACTTTTGC CGTTGGGCAC CAGCCGAATG CCGCCACTAA AACCGCTATG 



+2 PND RQFC MNG h V F ADR 



3151 GCCGAACGAT CGCCAGTTCT GTATGAflCGG .TCTGGTCTTT GCCGACCGCA 
CGGCTTGCTA GCGGTCAAGA CATACTTGCC ' AGACCAGAAA CGGCTGGC6T 



+2TPHP ALT EAKH QQ Q FFQ 



3201 CGCCGCATCC AGCGCTGACG GAAGCAAAAC ACCRGCAGCA GTTTTTCCAG ■ 
GCGGCGTAGG TC6CGACTGC CTTCGrTTTG TGGTCGTCGT QU^AAAGGTC 



+2-FRLS GQT l EV TSEY LFR 



3251 TTCCGTTTAT CCGGGCAAAC CATCGAAGTG ACCAGCGAAT ACCTGTTCCG 
AACGCAAATA GGCCCGTTTG GTAGCrxCAC TGGTCGCTTA TGGACAAGGC 



+2 H SD NELL HMM VAL DG K 



3301 TCATAGCGAT AACGAGCTCC TGCACrGGAT GGTGGCGCTG GATGGTAAGC 
AGTATCGCTA TTGCTCGAGG ACGTGACCTA CCACCGCGAC CTACCATTOG 



+2PLA5 GEV PLDV APQ GKQ 



3351 CGCTGGCAAG CGGTGAAGTG CCTCTGGATG TCGCTCCACA AGGTAAACAG 
CCGACCGTrC GCCACTTCAC GGAGACCTAC AGCGAGC3TGT XCCATTTGTC 



-«-2LIEL PEL PQP ESAG QLW 



34 01 ' TTGATTGAAC TGCCTGAACT ACCGCAGCCG GAGAGCGCCG GGCAACTCTG 
AACTAACTTG ACGGACTTGA TGGCGTCGGC CTCTCGCX3GC CCGTTGAGAC 



+2 LTV RVVQ PNA TAW SEA 



34 51 GCTCftCAGTA CGCGTAGTGC AflCCGAACGC CACCCCATGC TCAGAAGCCG 
CGAGTGTCAT GCGCATCACG rrGGCTTGCG CTGGCCTACC AGTCTTCCGC 
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+2GHIS QWRL AEN LSV 



3501 GGCACATCAG CGCCTGGCAG CAGTGGCGTC TGGCGGAAAA CCTCAGTGTG 
CCGTGTAGTC GCGGACCGTC GTCACCGCAG ACCGCdTTT GGAGTCACAC 



12TLPA ASK AI P HLTT SEM 



3551 ACGCTCCCCG CCGCGTCCCA CGCCATCCCG CATCTGACCA CCAGCGAAAT 
TGCGAGGGGC GGCGCAGGGT GCGGTRGGGC GTAGACTGGT GGTCGCTTTA 



+2 DFC ISLG NKR WQF NRQ 



3601 GGRTTTTTGC ATCGAGCTGG GTAATAAGCG TTGGCAATTT AACCGCCAGT 
CCTAAAAACG TAGCTCGACC CAriATTCGC AACCGTTAAA TTGGCGGTCA 



+ 2SGFL SQM WIGD.KKQ LLT 



3651 CAGGCrTTCT TTCACAGATG TGGATTGGCG ATAAAAAACA ACTGCTGACG 
GTCCGAAAGA AAGTGTCTAC ACCTAACCGC TATTTTTTGT TGACGACTGC 



+ 2PLRD QTT RAP LDNO IGV 



3701 CCGCTGCGCG ATCAGTTCAC CCGTGCACCG CTGGATAACG ACATTGGCGT 
GGCGACGCGC TAGTCAAGTG GGCACGTGGC GACCTATTGC TGTAACCGCA 



+2 SEA TRID PNA WVS RWK 



3751 AAGTGAAGCG ACCCGCATTG ACCCTAACGC CTGGGTCGAA CGCTGGjAAGG 
TTCACTTCGC TGGGCGT AAC TGGGATTGCG GACCCAGCTT GCGACCTTCC 



+2AAGH YQA EA AL LQC TAD 



3801 CGGCGGGCCA TTACCAGGCC GAAGCAGCGT TGTTGCAGrG CACGGCAGAT 
GCCGCCCGGT AATGGTCCGG CnPCGTCGCA ACAACGTCAC GTGCCGTCTA 



+2T LA D AVL ITT AHAW QHQ 



3851 ACACTTGCTG ATGCGGTGCT GATTACGACC GCTCACGCGT GGCAGCATCA 
TGTGAACGAC TACGCCACGA CTAATGCTGG CGAGTGOQCA CCGTCGTAGT 



42 GKT LFIS RKT YRI DG3 



3901 GGGGAAAACC TTATTTATCA GCCGGAAAAC CTACCGGATT GATGGTAGTG 
CCCCnTTGG AATAAATAGT CGGCCTTTTG GATGGCCIAA CTACCATCAC 



+2GQMA ITV D VEV ASD TPH 



3951 GTCAAATGGC GATTACCGTT GATGTTGAAG TGGCGAGCGA TACACCGCAT 
CAGXriACCG CTAATGGCAA CTACAACTTC ACCGCTCGCT ATGTGGCGTA 



+ 2 PARI GLH CQL AQVA ERV 



4001 CCGGCGCGGA TTGGCCTGAA CTGCCAGCTG GCGCAGG7AG CAGAGCGGGT 
GGCCGCGCCT AACCGGACTT GACGGTCGAC CGCGTCCATC'* GTCTCGCCCA 



.-^2 NHL GLGP QEN YFD RLT 



4051 AAACTGGCTC GGATTAGGGC CGCAAGAAAA CTATCCCGAC CGCCTTACTG . 
TTTGACCGAG CCTAATCCCG GCGTrCTTTT GATAGGGCTG GCGGAATGAC 
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+ 2A .AC F D«W D UPL SDH YT P 



4101 CCGCCrCTTT TGACCGCTGG GATCTGCCAT TGTCPiGACAT GTATACCCCG 
CGCGGACAAA. ACTGCCGACC CTAGACGGTA ACAGTCTGTA CATATGGGGC 



•t-2VVFP SEN CLR CGTR ELN 



4151 TACGTCTTCC CGAGCGAAAA CGGTCTGCGC TGCGGGACGC GCGAATTGAA 
ATGCAGAAGG GCTCGCTTTT GCCAGACGCG. ACGCCCTGCG CGCTTAACTT 



+ 2 YGP HQWR GDr QFN ISR 



4201 TTATGGCCCA CACCAGTGGC GCCGCGACTT CCAGTTCAAC ATCAGCCGCT 
AATACCGGGT GTGGTCACCG CGCCGCTGAA GGTCAAGTTG TAGTCGGCGA 



+ 2YSQ Q QLM ETSH RHL LHA 



4251 ACAGTCAACA GCAACTGATG GAAACCAGCC ATCGCCAtCX GCTGCACGCG 
TGTCAGTTGT CGTTGACTAC CTTTGGTCGG TAGCGGTAGA CGACGTGCGC . 



+2EE GT HLN I DG FHHG IGG 



4 301 GAAGAAGGCA CATGGCTGAA TATCGACGGT TTCCATATGG GGATTGGTGG 
CTTCTTCCCST 6TACCGACTT ATAGCTCqCA AAGGTATACC CCTAACCACC 



+2'DD3 WSPS V SA EFQ LSA 



4 351 -CGACGACTCC-TGGAGCCCGT CAGTATCGGC GGAATTCCAG . CTGAGCGCCG 
GCTGCTGAGG ACCTCGGGCA GTCATAGCCG CCTTAAGCTC GACTCGCGGC 



+2CRY H VOL VWCQ KRS DYK 



44 01 GTCGCTACCA TTACCAGTTG GTCTGGTGTC AAAAAAGATC TGACTATAAA 
CAGCGATGGT AATGGTCAAC CAGACCACAG TTTTTTCTAG ACTGATAriT 



+ 2DEDL DHH HHH HR 



4451 GATGAGGACC TCGACCATCA TCATCATCAT CACCGGTAAT AATAGGTAGA 
CTACTCCTGG AGCTGGTAGT AGTAGTAGTA GTGGCCATTA TTATCCATCT 



4501 TAAGTGACTG ATTAGATGCA TTGATCCCTC GACCAATTCC GGTTATTTTC 
ATTCACTGAC TAATCTACGT AACTAGGGAG CTGGTTAAGG CCAATAAAAG 



4551 CACCATATTG CCGTCTTTTG GCAATGTGAG GGCCCGGAAA CCTGGCCCTG 
GTGGTATAAC GGCAGAAAAC CGTTACACTC CCGGGCCTTT GGACCGGGAC 



4601 TCTTCTTGAC GAGCATTCCT AGGGGTCTTT CCCCTCTCGC CAAAGGAATG 
AGAAGAACTG CTCGTAAGGA TCCCCAGAAA GGGGAGAGCG GTTTCCTTAC 



4651 CAAGGTCTGT TGAATGTCGT GAAGGAAGCA GTTCCTCTGG AAGCTTCTTG 
GTTCCAGACA ACTTACAGCA CTTCCTTCGT CAAGGAGACC TICGAAGAAC 



4701 AAGACAAACA ACGTCT6TAG CGACCCTTTG CACGCAGCGG AACCCCCCAC 
TTCTGTTTGT TGCA6ACATC GCTGGGAAAC GTCCGTCGCC TTGGGGGGTG 



4751 CTGGCGACAG GTCCCTCTGC GGCCRAAACC CRCGTCTATA AGATACACCT 
GACCGCTGTC CACGGAGACG CCGGTTTTCG GTGCACATAT TCTATGTGGA 
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4 601 GCAAAGGCGG CACAACCCCA CTGCCACGTT G?GAGTr3GA TAGTTGTGGA 
CGTTTCCGCC GTGTTGGGGT CACGGTGCAA CACTCAAOCT ATCAACACCT 



4 851 AACACTCAAA TCCCTCTCCT CWlGCCTATT CAACAAGGGG CTGAAGGATG 
TTCTCACTTT ACCCACAGCA CTTCCCATAA G'TTGTTCXXIC 'GACTTCCTAC 



4 901 CCCAGAAGGT ACCCCATTGT ATGGGATCTG ATCTGGGQCC TCGGTGCACA 
GGGTCTTCCA TGGGGTAACA TACCCTAGAC TAGACCCOGG AGCCACGTGT 



4 951 TGCTTTACAT GTGTTTAGTC GAGGTTAAAA AACGTCTAGG CCCCCCGAAC 
ACGAAATGTA CACAAATCAG CTCCAATTTT TTGCAGATCC GGGGGGCTTG 



SOOl CACGCGGACG TGCTTTTCCT 7TCAAAAACA CCATCArAAT ACCATCATTC 
CTGCCCCTCC ACCAAAAGGA AACTTTTTCT GCTACTATTA TCGTACTAAC 



5051 AACAAGATGG ATTGCACGCA CGTrCTCCGG CCGCTTGCJGT GGA6AGGCTA 
TTGTTCTACC TAACGTGCGt CCAAGAGGCC GGCGAACCCA CCTCTCCGAT 



5101 TTCGGCTATG ACTCGGCACA ACAGACAATC GGCTGCrCTG ATGCCGCCGT 
AAGCCGATAC TGACCCGTGT 7GTCTGTTAG CCGACGAGRC TACGGCGGCA 



a 151 GTTCCGGCTG TCAGCGCAGG GGCGCCCGGT TCTTTTTGTC AAGACXIGACC 
CAAGGCCGAC AGTCGCGTCC CC6CGGGCCA AGAAAAACAG TTCTGGCTGG 



5201 TGTCC6GTGC CCTGAATGAA CTGCAG6ACG AGGCAGOGCG GCTATCGTGG 
ACAGGCCACG GGACTTACTT GACGrCCTGC TCCGTCGCGC CGATAGCACC 



5251 CTGGCCACGA CGGGCGTTCC TTGCGCAGCT GTGCTCC3VCG TTGTCACTGA 
GACCGGTGCT GCCCGCAAGG AACGCGTCGA CACGAGCTGC AACAGTGACT 



S301 AGCGGGAAGG GACTGGCTGC TATTGGGCGA AGTGCCGGGG CAGGATCTCC 
TCGCCC.TTCC CTGACCGACG ATXACCCGCT TCACGGCCCC GTCCTAGAGG 



5351 TGTCATCTCA CCTTGCTCCrr GCCGAGAAAG TATCCATCAT GGCTGATGCA 
ACAGTAGAGT GGAACGAGGA CGGCrCTTTC ATAGGTAGIA CCGACTACGT 



5401 ATGCCGCCCC TGCATACGCT TGArCCGGCT ACCTCCOCA7 TCGACCACCA 
TACGCCGCCG ACGTATGCGA ACTAGGCCGA TGGACGGGTA AGCTGGTGGT 



5451 AGCGAAACAT CGCATCGAGC GAGCACGTAC TCGGATGGAA GCCGGTCTTG 
TCGCTTTGTA GCGTAGCTCG CTCGTGCATG AGCCTADC7T CGGCCAGAAC 



5501 TCCATCAGGA TCATCTGGAC GAAGAGCATC AGCCGCTCGC GCCACCCGAA 
AGCTAGTCCT ACTACACCTG CTTCTCGTAG TCCCCGAGCG CGGTCGGCTT 



5551 CTCrrCGCCA GGCTCAAGGC GCGCATGCCC GACGGCGAGG ATCTCGTCGT 
GACAAGCGGT CCGAGTTCOG CCCGTACGGG CTGCCGCTCC TAGAGCAGCA 



5601 GACCCATGGC GATGCCTGCT TGCCGAATAT CATGGTQGAA AATGGCCGCT 
CTGGGTACCC CTACGGACGA ACGGCTTATA GTACCACCTT TTACXGGCGA 



5651 TTTCTGGArT CATCGACTGT GGCCGGCTGG GTGTGGCGGA CCGCTATCAG 
AAAGACCTAA GTAGCTGACA CCGCCCGACC . CACACCGCCT 6GC6ATAGTC 



5701 GACATAGCGT TGGCTACCCC TGATATTGCT GAA6AGCI7G GCGGCGAATG 
CTGTATCGCA ACCGATGGGC AC7ArAACGA CTTCTCGftAC CGCCGCTTAC 
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GGCTGACCGC TTCCTCGTGC TTTACGGTAT CGCCGCTCCC GATTCGCAGC 
, CCGACTGGCG AAGGAGCACG AAATGCCATA GCGGCGAGG6 CTAAGCGTCG 


5801 


CGTAGCGGAA 


^ 1 M 1 LiUXfVa* I I 

GATAGCGGAA 


f'**rTrtif*RfinT Tf**rTr*rd/;^ fuzftuf^f^ffzft 
GAACTGCTCA AGAAGRCTCG CCCTGAGACC 


5851 


GGTTCGCATC 
CCAAGCGTAG 


GATAAAATAA 
CTATTTTATT 


AAGATTTTAT TTAGTCTCCA GAAAAAGGGG 
TTCTAAAATA AATCAGAGGT CTTTTTCCCC 


5901 


GGAATGAAAG 
CCrTACTTTC 


ACCCCACCTG 
TGGGGTGGAC 


TAGGTTTGGC AAGCTAGCTT AAGTAACGCC 
ATCCAAACCG TTCGATCGAA TTCATTGCGG 


5951 


ATTTTGCAAG 
TAAAACGTTC 


GCATGGAAAA 
CGTACCTTTT 


ATACATAACT GAGAATAGAG AAGTTCA6AT 
TATGTATTGA CTCTTATCTC TTCAAGTCTA 


6001 


CAAGGTCAGG 
GTTCCAGTCC 


AACAGATGGA 
TTGTCTACCT 


ACAGCTGAAT AIGGGCCAAA* CAGGATATCT 
TGTCGACTTA TACCCGGTTT 6TCCTATAGA 


6051 


GTGGTAAGCA 
CACCATTCGT 


GTTCCTGCCC 
CAAGGACGGG 


CGGCTCAGGG CCAAGAACAG ATGGAACAGC 
GCCGAGTCCC GGTTCTTGTC TACCTTGTCG 


€101 


TGAATATGGG 
ACTTATACCC 


CCAAACAGGA 
GGTTTGTCCT 


TATCTGTGGT AAGCAGT?CC TGCCCCGGCT 

ATAGACACCA TTCGTCAAGG ACGGGGCCGA • ' 


6151 


CAGGGCCAAG 
GTCCCGGTTC 


AACAGATGGT 
TTGTCTACCA 


CCCCAGATGC GGTCX^GCCC TCAGCAGTTT 
GGGGTCTACG CCAGGTCG6G AGTCGTCAAA 


€201 


CTAGAGAACC 
GATCTCTTGG 


ATCAGATGTT 
TAGTCTACAA 


TCCAGGGTGC CCCAAGGACC TGAAATGACC 
AGGTCCCACG GGGTTCCTGG ACTTTACTGG 


6251 


CTGTGCCrrA 
GACACGGAAT 


TTTGAACTAA 
AAACTTGATT 


CCAATCAGTT CGCTTCTCCC TTCTCTTCGC . . 
GGTTAGTCAA GCGAAGAGCG AAGACAAGCG 


6301 


GCGCTTCTGC 
CGCGAAGACG 


TCCCCGAGCT 
AGGGGCTCGA 


CAATAAAAGA GCCCACAACC CCTCACTCGG 
GTTATTTTCT CGGGTGTTGG GGAGTGAGCC 


6351 


GGCGCCAGTC 
CCGCGGTCAG 


CTCCGATTGA 
GAGGCTAACT 


CTGAGTCGCC CGGGTACCCG TGTATCCAAT 
GACTCAGCXjG GCCCATGGGC ACATAGGTTA 


6401 


AAACCCTCTT 
TTTGGGAGAA 


GCA6TTGCAT 
CGTCAACGTA 


CCGACrrCTG GTCTCGCTGT TCCTTGGGAG 
GGCTGAACAC CAGAGCGACA AGGAACCCTC 


6451 


GGTCTCCTCT GAGTGATTGA CTACCCGTCA .GCGGGGGrCT TTCATTCATG 
CCAGAGGAGA CTCACTAACT GA7GGGCAGT CGCCCCCAGA AAGTAAGTAC 


6501 


CAGCATGTAT 
GTCGTACATA 


CAAAATTAAT 
GTTTTAATTA 


TTGGTTTTTT TTCTTAAGTA TTTACATTAA 
AACCAAAAAA AAGAATTCAT AAATGTAATT 


6551 


ATGCCCATAG 
TACCGGTATC 


XTGCATTAAT 
AACGTAATTA 


GAATCGGCCA ACGCGCGGGG AGAGGCGGTT 
CTEAGCCGGT TGCGCGCCCC TCTCCGCCAA 


6601 


TGCGTATTGG C6CTCTTCCG CTTCCTCGCT CACTGACTCG CTGCGCTCGG 
ACGCATAACC GCGAGAAGGC GAAGGAGCGA GT6ACTGAGC GACGCGAGCC 


6651 


TCCTTCGGCT 
AGCAACCCGA 


GCGGCGAGCG 
CGCCGCTCGC 


GTATCAGCTC .ACTCAAAGGC GGTAATACGG 
CATAGTCGAG TGAGTTTCCG CCATTATGCC 
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1 CTGCAGCCTG AATATGGGCC AAACAGGATA TCTGTGGTAA GCAGTTC^G 
GACGTCGGAC TTATACCCGG TTTGTCCTAT AGACACCATr CGTCAAGGAC 



51 CCCCGGCTCA GGGCCAAGAA CAGATGGAAC AGCTGAATAT CCGCCAAACA 
GGGGCCGAGT CCCGGTTCTT GTCTACCTTG TCGACTTATA CCCGGTTTGT 

lOX GGATATCTGT GGTAAGCAGT TCCTGCCCCG GCTCAGGGCC AAGAACAGAT 
CCTATAGACA CCATTCGTCA AGGACGGGGC CGAGTCCCGG TTCTTGTCTA 



151 GGTCCCCAGA TGCGGTCCAG CCCTCAGCAG TTTCTAGAGA ACCATCAGAT 
CCAGGGGTCT ACGCCAGQTC GGGAGTCGTC AAAGATCTCT TGGTAGTCTA 



201 GTTTCCRGGG TGCCCCAAGG ACCTGAAATG ACCCl-GTCCC TTATTTGAAC 
CAAAGGTCCC ACGGGGTTCC TGGACTTTAC TGGGACACGG AATAAACTTG 



251 TAACCAATCA GTTCGCTTCT CGCTTCTGTT CCCGCGCTTC " TGCTCCCCGA 
ATTGGTTAQT CAAGCGAAGA GCGAAGACAA GCGCGCGAAG ACGAGGGGCT 



301 GCTCAATAAA AGAGCCCACA ACCCQTCACT CGGGGCGCCA GTCCTCCGAT 
CGAGTTATTT TCTCGGGTGT TGGGGA6TGA GCCCCGCGGT CAGGAGGCTA 



351 TGACTGAGTC GCCCGGGTAC CCGTGTATCC AATAAACCCr CTTGCAGTTG 
ACTGACrCAG CGGGCCCATG GGCACATAGG TTATTTGGGA GAACGTCAAC 



401 CATCCGACTT GTGGTCTCGC TGTTCCrTGG GAGGGTCTCC TCTGAGTGAT 
GTAGGCrGAA CACCAGAGCG ACAAGCAACC CTCCCAGAGG AGACTCACTA 



451 TGACTACCCG TCAGCGGGGG TCTTTCATITT GGGGGCTCGT CCGGGATCGG 
ACTCATGGGC AGTCGCCCCC AGAAAGTAAA CCCCCGAGCA GGCCCTAGCC 



501 GAGACCCCTG CCCAGCCACC ACCCACCCAC CACCGGGAGG CAAGCTGGCC 
CTCTGGGGAC GGCTCCCTCG TCGCTGCGTG GTGGCCCTCC GTTCGACCGG 



-551 AGCAACTTAT CTGTGTCTGT CesaTTGTCT AGTGTCTATG ACTGATTTTA 
TCGTTGAATA GACACAGACA GGCTAACAGA TCACAGATAC TGACTAAAAT 



601 TGCGCCTGCG TCGGTACTAG TTAGCTAACT AGCTCrGTAT CTGGCGGACC 
ACGCGGACGC AGCCATGATC AArCGATTGA TCGAGACATA GACCGCCTGG 



651 CGTGGTGGRA CTGACGAGTT CTGAACACCC GGCCGC3U;CC CTGGGAGACG 
GCACCACCTT GACTGCTCAA GACTTGTGGG CCGGCGTTGG GACCCTCTGC 



701 TCCCAGGGAC TTTGGGGGCC GTTTTTCTGG CCCGACCTGA GGAAGGGAGT 
AGGGTCCCTG AAACCCCCGG CAAAAACACC GGGCTGGACT CCTTCCCTCA 



751 CCATGTGGAA TCCGACCCCG TCAGGATATG TGGTTCTGGT AGGAGACGAG 
GCTACACCTl AGGCTGGGGC AGTCCTATAC ACCAAGACCA TCCTCTGCTC 



901 AACCTAAAAC AGTTCCCGCC TCCGTC7GAA TTTTTGCTTT CGGTTTGGAA 
TTGGATTTTG TCAAGGGCGG AGGCAGACTT AAAAACGAAA GCCAAACCTT 



851 CCGAAGCCGC GCGTCTTGTC TGCTGCAGCA TCGTTCTGTG TTGTCTCT6T 
GGCTTCGGCG CGCAGAACAC ACGACGTCGT AGCAAGACAC AACAGAGACA 



901 ' CTGACTGTGT TTCTGTATTT GTCTGAAAAT TAGGGCCAGA CTGTTACCAC 
GACTGACACA AAGACATAAA CAGACTTTTA ATCCCGGTCT GACAATGG?G 
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951 TCCCTTAAGT TTGACCTTAG GTAACTGGAA AGATGTCGAG CGGCTCGCTC 
AGGGAATTCA AACTGGAATC CATTGACCTT TCTACAGCTC GCCGAGCGAG 



1001 ACAACCAGTC 6GTAGATGTC AAGAAGAGAC GTTGGGTTAC CTTCTGCTCT 
TGrXGGTCAG CCATCTACAG TTCTTCTCTG CAACCCAATG GAAGACGAGA 



1051 GCAGAArGGC CAACCTTTAA CGTCGGArCG CCGCGAGACG GCACCTTTAA 
CGTCTTACCG GTTGGAAATT GCAGCCTACC GGCGCTCTGC CGTGGAAATT 



1101 CCGAGACCTC ATCACCCAGG T7AAGATCAA GGTC7TTTCA CCTGGCCCGC 
GGCTCTG6AG TAGTGGGTCC AATTCTAGTT CCAGAAAAGT GGACCGGGCG 



1151* ATGGACACCC AGACCAGGTC CCCTACATCG TGACXTGGGA AGCCTTGGCT 
TACCTGrGGG TCTGGtCCAG GGGATGTAGC ACTGGACCCT TCGGAACCGA 



1201 TTTGACCCXC CTCCCTGGGT CAAGCCCTTT GTACACCCTA AGCCTCCGCC 
AAAC7GGGGG GAGGGACCCA G7TCGGGAAA CATG7GGGAT TCGGAGGCGG 



1251 TCCTCTTCCT CCATCCGCOC CGTCTCTCCC CCTTGAACCT CCTCGTTCGA 
AGGAGAAGGA GGTAGGCGGG GCAGAGAGGG GGAACTtOGA GGAGCAAGCT 



1301 CCCCGCCTCG ATCCTCCCTT TATCCAGCCC TCACTCCPrC TCTAGGCGCC. 
GGGGCGGAGC TAGGAGGGAA A7ACGTCGGG /VGTGAGGAAG AGATCCGGGG 



1351 GGCCGCTCTA 6CCCATTAAT ACGACTCACT ATAGGGCGAT TCGAACACC^V 
CCGGCGAGAT CGGGTAATTA TGCTGAGTGA TATCCCGCTA AGCTTGTGGT 



1401 TGCACCATCA TCATCATCAC G7CGACTATA AAGA7GACGA CCTCGAGATG 
ACGTCCTACT AGTAGTAGTC CAGCTGATAT TTCTACTCCT CGACCrCTAC 



IflSl GGCGTGATTA CGGATTCACT GGCCGTCGTG GCCCGCACCG ATCGCCCTTC 
CCGCACTAAT GCCTAAGTGA CCCGCAGCAC CGGCCGTGGC TAGCGGGAAG 



1501 • CCAACAGTTA CGCAGCCTGA ATGGCGAATG GCGC7TTG3C TGGTTTCCGG 
GGTTGTCAAT GCGTCGGACT TACCGCTTRC CGCGAAACGG ACCAAAGGCC 



1551 CACCAGAAGC GGTGCCGGAA AGC7GGCTGG AGTGCGArrT rCCTQAGGCC 
GTGGTCTTCG CCACGGCCTT ICGACCGACC TCACGCTAGfi AGGACTCCGG 



1601 6ATACT6TCG TOGTCCCCTC AAACTGGCAG ATGCACGGTT ACGATGCGCC 
CTATGACAGC AGCAGGG6AG T77GACCGTC TACGTGCCRA TGCTACGCGG 



1S51 CATCTACACC AACGTGACCT AlCCCATTAC GGTCAATCXG CCGTTTGTTC 
GTAGATGTGG TTGCACTGGA TAGGGTAATG CCAGTTAGQC GGCAARCAAG 



1701 CCACGGXGAA TCCGACGCGT TGTTXCtCGC TCACATTTRA TGTTGATGAA 
CCTCCCTCTT ACCCTCCCCA ACAACCACCG ACrGTAAATT ACAACTACTT 



1*751 AGCTGGCTAC A6GAACCCCA GACGCGAATT ATTTTTGATG GCGTTAACTC 
TCGACCGATC TCCTTCCGGT CTGCGCTTAA TAAAAACTRC CGCAATTGAG 



1801 GGCGTTTCAr CTGT6GTGCA ACGGGCGCTG GGTCGGVrAC GGCCflGGACA 
CCGCAAAGTA GACACCACGT TGCCCGCGAC CCAGCCAATG CCGGTCCTGT 



1851 GTCGTTTGCC GTCTGAATTT GACC7GAGCG CATTTTTACXS CGCCGGAGAA 
CAGCATIACGG CAGACTTAAA CTGGACTCGC GTAAAAATSC CCCGCCTCTT 
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1501 AACCGCCTCG CGGTGATGGT GCTHCGCTGC AGTGACGGCA GTTATCTGGA 
TTGGCGGAGC GCCACTACCA CGAf :« I'X ACC TCACTGCCGT CAATAGACCT 



1951 AGATCAGGAT ATGTGGCGGA TGAGCGGCAT TTTCCGTGAC GTCTCGTTGC 
TCTAGTCCTA TACACCGCCT ACTCGCCGTA AAAGGCACTG CAGAGCAACG 



2O01 TGCATAAACC GACTACACAA ArCAGCGATT TCCATGTTGC CACTCGCTTT 
ACGTATTTGG CTGATGTGTT TAGTCGCTAA AGGTACAACG GTGAGC6AAA 



2051 AATGATGATT TCAGCCGCGC TGTACTGGAG GCTGAAGT7C AGATGTGCGG 
TTACTACTAA AX3TCGGCGCG ACATGACCTC CGACTTCAAG TCTACACGCC 



2101 CGAGTTGCGT GACTACCTAC GGGTAACAGT TTCTTTATGG CAGGGTGAAA 
GCTCAACGCA CTGATGGATG CCCATTGTCA AAGAAATACC GTCCCACTTT 



2151 CGCAGGTCGC CAGCGGCACC GCGCCTTTCG GCGGTGAAAT TATCGATGAG 
GCGTCCAGCG GTCGCCGTGG CGCGPAAAGC CGCCACTTTA ATAGCTACTC 



2201 ■ CGTGGTGGTT ATCCiCGATCG CGTCACACTA CGTCTGAACG TCGAAAAOCC 
GCACCACCAA TACGGCTAGC GCAGTGTGAT GCAGACT7GC AGCTTTTGGG 



2251 GAAACTGTGG AGCGCCGAAA TCCCGAXTCT CTATCGTGCG GTGGTTGAAC 
• CTTTGACACC TCGCGGCTTT AGGGCTTAGA GATAGCACGC CACCAACTTG 



2301 TGCACACCGC CGACGGCACG CTGATTGAAG CAGAAGCCtG CGATGTCGGT 
ACGTGTGGCG GCTGCCGTGC GACTAACTTC GTCTTCGGAC GCTACAGCCA 



2351 TTCCGCGAGG TGCGGATTGA AAATGGTCTG CTGCTGCTGA ACGGCAAGCC 
AAGGCGCTCC ACGCCTAACT TTTACCAGAC GACGACGACT TGCCGTTCGG 



2401 GXTGCTGATr CGAGGCGTTA ACCGTCACGA GCATCATCCT CTGCATGGTC 
CAACGACTAA GCTCCGCAAT TGGCAGTGCT CGTAGTAGGA GACGTACCAG 



-24 51 -ACCTCATGGA-TGAGCAGACC ATGGTGCAGG ATATCCTGCT GATGAAGCAG 
TCCAGTACCT ACTCGTCTGC TACCACGTCC TATAGGACGA CTACTTCGTC 



2501 AACAACTTTA ACGCX:GTGCC CTGTTCCGAT TATCCGAACC ATCCGCTGTG 
TTGTTGAAAT TGCGGCACGC GACAAGCGTA ATAGGCTTGG TAGGCGACAO 



2551 GTACACGCTG TGCGACCGCT ACGGCCTGTA TGTGGTGGAT GAAGCCAATA 
CATGTGCGAC ACGCTGGCGA TGCCGGACAT ACACCACCTA CTTCGGTTAT 



2601 TTGAAACCCA CGGCATGGTG CCAATGAATC GTCTGACCGA TGATCCGCCC 
AACTTTCGGT GCCGTACCAC GGTTACTTAG CAGACTGCCT ACTAGGCGCG 



2651 TGGCTACCGG CGATGAOCGA ACGCGTAACG CGAATGGTGC AGCGCGATCG 
ACCGATCGCC CCTACTCGCT TGCGCATTGC GCTTACCACG TCGCGCTACC 



2701 * TAATCACCCG AGTGTGATCA TCTGGTCGCT GGGGAATGAA TCAGGCCACG 
ATTAGTGGGC TCACACTAGT AGACCAGCGA CCCCTTACTT AGTCCGGTGC 



2751 GCGCTAATCA CGACGCGCTG TATCGCrCGA TCAAATCTCT CGATCCTTCC 
CGCGRTTAGT GCTGCGCGAC ATAGCGACCT AGTTTAGACA GCTAGGAAGQ 



2801 CGCCCGGTGC AGTATGAAGG CGGCGGAGCC GACACCACGG CCACCGATAl 
GCG6GCCACG TCATACTTCC GCCGCCTCGG CTGTGGTQCC GGTGGCTATA 
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2351 TATTTGCCCG ATGTACGCGC GCGTGGATGA AGACCAGCCC TTCCCGGCTG 
ATAAACGGGC TACATGCGCG CGCACCTACT TCTGGTCGGG AAGGGCCGAC 



29C1 TGCCGAAATG GTCCATCAAA AAATGGCTTT CGCTACCTG6 AGAGACGCGC 
ACGGCTTTAC CAGGTAGTTT TTTACCGAAA GCGATGGACC TCTCTGCGCG 



2931 CCGCTGRTCC TTTGCGAATA CGCCCACGCG ATGGGTAACA GTCTTGGCGG 
GGCGACTAGG AAACGCTTRT GCGGGTGCGC TACCCRTTGT CAGAACCGCC 



3001 TTTCGCTAAA TACTGGCAGG CGTTTCGTCA GTATCCCCGT TTACAGGGCG 
AAAGCGATTT ATGACCGTCC GCAAAGCAGT CATAGGGGCA' AATGTCCCGC 



3051 GCTTCGTCTG GGACTGGGTG GATCAGTCGC TGATTAAATA TGATGAAAAC 
CGAAGCRGAC CCT6ACCCAC CTAGTCAGC6 ACTAATTTAT ACTACTTTTG 



3101 GGCAACCCGT GGTCGGCTTA CGGCGGTGAT TTTGGCGATA CGCCGAACGA 
CCGTTGGGCA CCAGCCGAAT GCCGCCACTA AAACCGCTAT GCGGCTTGCT 



3151 TCGCCAGTTC TGTATGAACG GTCTGGTCTT TGCCGACCGC ACGCCGCATC 
AGCGGTCAAG ACATACTTGC CAGACCAGAA ACGGCTGGCG TGCGGCGTAG 



3201 CAGCGCTGAC GGAAGCAAAA CACCAGCAGC AGTTTTTCCA GTTCCGTTTA 
GTCGCGACTG CCTTCX3TTTT GTGGTCGTCG TCAAAAAGGT CAAGGCAAAT 



3251 TCCGGGCAAA CCATCGAAGT GACCAGCGAA TACCTGTTCC GTCATAGCGA 
AGGCCCGTTT GGTAGCTTCA CTGGTCGCTT ATGGACAAGG CAGTATCGCT 



3301 TAACGAGCTC CTGCACTGGA TGGTGGCGCT GGATGGTAAG CCGCTGGCAA 
ATTGCTCGAG GACGTGACCT ACCACCGCGA CCTACCATIC GGCGACCGTT 



^ 3351 GCGGTGAAGT GCCTCTGGAT GTCGCTCCAC AAGGTAAACA GTTGATTGAA 
CGCCACTTCA CGGACACCTA CAGCGAGGTG TTCCATTTGT CAACTAACTT 



3401 CTGCCTGAAC ^ACCGCAGCC GGAGAGCGCC GGGCAACTCT GGCTCACAGT 
GACGGACTTG ATGGCGTCGG CCTCTCGCGG CCCGTTGAGA CCGAGTGTCA 



3451 ACGCGTAGTG CAACCGAACG C6ACCGCATG GTCAGAAGCC GGGC3VCATCA 
TGCGCATCAC GTTGGCTTGC GCTGGCGTAC CAGTCTTCGG CCCGTGTAGT 



3501 GCGCCTGGCA GCAGTGGCGT . CTGGCGGAAA ACCTCAGTGT GACGCTCCCC 
CGCGGACCGT CGTCACCGCA GACCGCCTTT TGGAGTCACA CTGCGAGGGG 



3551 GCCGCGTCCC ACGCCATCCC GCATCTGACC ACCAGCGAAA TGGATTTTTG 
CGGC6CAGGG TGCGGTAGGG CGTAGACTGG IGGTCGCTTT ACCTAAAAAC 



3601 CATCGAGCT6 GGTAATAAGC GTTGGCAATT TAACCGCCAG TCAGGCTTTC 
GTAGCTCGAC CCATTATTCG CAACCGTTAA ATTGGCGCIC AGTCCGAAAG 



3651 TTTCACAGAT GTGGATTGGC GATAAAAAAC AACTGCTGAC GCCGCTGCGC 
AAAGTGTCTA CACXTAACCG CTATTTTTCTG TTGACGACTG CGGCGACGCG 



3701 GATCAGTTCA CCCGTGCACC GCTGGATAAC GACATTGGCG TAAGTGAAGC 
CTAGTCAAGT GGGCACGTGG CGACCTATTG CTGTAACCGC ATTCACTTCG 



3751 GACCCGCATT GACCCTAACG CCTGGGTCGA ACGCTGGAAG GCGGCGGGCC 
. CTGGGCGTAA CTCCGATTGC CGACCCACCT TGCGACOTC CGCCGCCCCC 
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3801 ATTACCAGGC CGAAGCAGCG TTGTTGCAGT GCACGGCAGA TACACTTGCT 
TAATGGTCCG GCTTCGTCGC AACAACGTCA CGTGCCGTCT ATGTGAACGA 



335X GATGCGGTGC TGATTACGAC CGCTCACGCC TGGCAGCATC AGGGGAAAAC 
CTACGCCACG ACTAATGCTG GCGAGTGCGC ACCGTCGTAG TCCCCTTTTG 



3901 CTTATTTATC AGCCGGAAAA CCTACCGGAT TGATGGTACT GGTCAAATGG 
GAATAAATAG TCGGCCTTTT GGATGGCCTA ACTACCATCA CCAGTTTACC 



3951 CGATTACCGT TGATGTTGAA GTGGCGAGCG ATACACOSCA TCCGGCGCGG 
GCTAATGGCA ACTACAACTT CACCGCTCGC TATGTGGCGT AGGOCGCGCC 



4001 ATTGGCCTGA ACTGCCAGCT GGCGCAGGTA GCAGAGCGGG TAAACTGGCT 
'TAACCGGACT TGACGGTCGA CCGCGTCCAT CGTCTCGCCC ATTTGACCGA 



4051 CGCATtAGGG CCGCAAGAAA ACTATCCCGA CCGCCTTACT GCCGCCTCTT 
GCCTAATCCC GGCGTTCTTT TGATAGGGCT GGCGGAATCA CGGCGGACAA 



4101 TTGACCGCTG- CGATCTGCCA TTGTCAGACA TGTATACCCC GTACGTCTTC 
AACTGGCGAC CCTAGACGGT AACAGTCTGT ACATATGGCC CATGCAGAAG 



4151 CCGAGCGAAA ACGGTCTGCG CPGCGGGACG CGCCAATTGA ATTATGGCCC 
GGCTCGCTTTC TGCCAGACGC CACGCCCTGC CCGCTTAACT TAATACCGGG 



4201 ACACCAGTGG CGCGGCGACT TCCAGTTCAA CATCAGCCGC TACAGTCAAC 
TGTGGTCACC GCGCCGCTGA AGGTCAAGTT CTAGTCGGCG ATGTCAGTTG 



4251 AGCAACTGAT GGAAACCAGC CATCGCCATC TGCTGCAOGC GGAAGAAGGC 
TCGTTGACTA CCTTTGGTCG GTAGCGGTAG ACGACGTQCG CCTTCTTCCG 



4301 ACATGGCTGA ATATCGACGG TTTCCATATG GGGATTGGTG GCGACGACTC 
TGTACCGACT TATAGCTGCC AAA6GTATAC CCCTAACCAC CGCTGCTGAG 



-4-351 -GTGGAGCCCG^CAGTATCGG-eGGAATTCCA GCT6AGC5CC CGTCGCTACC 
GACCTCGGGC A6TCATAGCG GCCTTAAGGT CGACtCGCQG CCAGCGATGG 



4 401 ATTACCACrr GGTCTGGTGT CAAAAAAGAT CTGGAGGTGG TGGCAGCAGG 
TAATGGTCAA CCAGACCACA GTTTTrTCTA GACCTCCACC ACCGTCGTCC 



4 451 CCTTGGCCCG CCGGATCCTT AATTAACAAT T6ACCGGTAA TAATAGGTAG 
GGAACCGCGC GGCCTAGGAA TTAATTGTTA ACTGGCCATT ATTATCCATC 



4 501 ATAAGTGACT GATTAGATGC ATTGATCCCT CGACCAATTC CGGTTATTTT 
TATTCACTGA CTAATCTACG TAACTAGGGA GCTGGTTAAG GCCAATAAAA 



4551 CCACCATATT GCCGTCTTTT GGCAATGTGA GGGCCCGGAA ACCTGGCCCT 
GGTGGTATAA CGGCRGAAAA CXGTTACACT CCCGGGCCTT TGGACCCGGA 



4 601 GTCTTCTTGA CGAGCATTCC TAGGGGTCTT TCCCCTCTOG CCAAAGGAAT 
CAGAAGAACT GCTCGTAAGG ATCCCCAGAA ACGGGAGAGC GGTTTCCTTA 



4 651 GCAAGGTCTG TTGAATGTCG TGAAGGAAGC AGTTCCTCrG GAACCTTCTT 
CGTTCCACAC AACTTACAGC ACTTCCTTCG TCAAGGAGAC CTTCGAAGAA 



4701 GAAGACAAAC AACGTCTGTA GCGACCCTTT GCAGGCAGCG GAACCCCCCA 
CTTCTGTTTG TTGCAGACAT CGCTGGGAAA CGTCCGTCGC CTTGGGGGGT 
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4751 CCTGGCGACA GGTGCCTCTG CGGCCAAAAG CCACGTGTA? AAGATACACC 
GGACCGCTGT CCACGGAGAC GCCGGTTTTC GGTGCACATA TTCTATGTGG 



4 a 01 TGCAAAGGCG GCACAACCCC AGTGCCACGT TGTGAGTTGG ATAGTTGTGG 
ACGTTTCCGC CGTGTTGGGG TCACGGPGCA ACACTCAACC TATCAACACC 



4851 AAAGAGTCAA ATGGCTCTCC TCAAGCGTAT TCAACAAGGG GCTGAAGGAT 
TTTCTCAGTT TACCGAGAGG AGTTCGCATA AGTTGnCCC CGACTTCCTA 



4 901 GCCCAGAAGG TACCCCATTG TATGGGATCT GATCTGGGGC CTCGGTGCAC 
CGGGTCTTCC ATGGGGTAAC ATACCCTAGA CTAGACCCCG GAGCCAqCTG 



4951 ATGCTTTACA TGTGTTTAGT CGAGGTrAAA AAACGTCTAG GCCCCCCGAA 
TACGAAATGT ACACAAATCA GCTCCAATTT TTTGCAGATC CGGGGGGCTT 



5001 CCACGGGGAC GTGGTTTTCC TTTGAAAAAC ACGATGATAA TACCATGATT 
GGTGCCCCTG CACCAAAAGG AAACTTTTTG TGCTACTATT ATGGTACTAA 



5051 GAACAAGATG GATTGCACCC AGCTTCTCCG GCCGCTTGGG TGGAGAGGCT 
CTTGTTCTAC CTAACGTGCG TCCAAGAGGC CGGCGAACCC ACCTCTCCGA 



510 1 ATTCGGCTAT GACT.GGGCAC AACAGACAAT CGGCTGCTCT GATGCCGCCG 
TAAGCCGATA CTGACCCGTG TTCTCTGTTA GCCGACGAGA CTACGGCGGC 



5151 TGTTCCGGCT GTCAGCGCAG GGGCGCCCGG TTCTTTTTGT CAAGACCGAC 
ACAAGGCCGA CAGTCGCGTC CCCGCGGGCC AAGAAAAACA GTTCTGGCTG 



5201 CTGTCCGGTG CCCTGAATGA ACTGCAGGAC GAGQCAGCGC CGCTATCGTG 
GACAGGCCAC GGGACTTACT TGACGTCCTG C7(XGTCGCG CCGATAGCAC 



5251 gctggccacg acgggcgttc cttgcgcagc tgtgctcgac gttgtcactg 
cgacx:ggtgc tgcccgcaag gaacgcgtcg acacgagctg caacagtgac 



"5301 aagcgggaag ggactggctg'ctattgggcg aagtgccggg gcaggatctc 

TTCGCCCTTC CCTGACCGAC GA7AACCCGC TTCACGGCCC CGTCCTAGAG 



5351 CTGTCATCTC ACCTTGCTCC TGCCGAGAAA GTATCCATCA TGGCTGATGC 
GACAGTAGAG TGGAACGAGG ACGGCTCTTT CATAGGTAGT ACCGACTACG 



54 01 AATGCGGCGG CTGCATACGC TTGATCCGGC TACCTGCCCA TTCGACCACC 
TTACGCCGCC GACGTATCCG AACTAGGCCG ATGGACGGGT AAGCTGGTGG 



5451 AAGCGAAACA TCGCATCGAG CGAGCACGTA CTCGGATGOV AGCCGGTCTT 
TTCGCTTTGT AGCGTAGCTC GCTCGTGCAT GAGCCTACCT TCGGCCAGAA 



5501 GTCGATCAGG ATGATCTGGA CGAAGAGCAT CAGGGCCTOG CGCCAGCCGA 
CAGCTAGTCC TACTACACCT GCTTCTCGTA GTCCCCGAOC GCGCTCGGCT 



5551 ACTGTTCGCC AGGCTCAAGG CGCGCATGCC CGACGGCGAG GATCTCGTCG 
TGACAAGCGG TCCGAGTTCC GCGCGTACGG GCTGCCGCTC CTAGAGCAGC 



5601 TGACCCATGG CGATCCCTGC TTGCCGAATA rCATGGTGGA AAATGGCCGC 
ACTGGGTACC GCTACGGACG AACGGCTTAT AGTACCACCT TTTACCGGCG 



5651 TTTTCTGGAT TCATCGACTG TGGCCGGCTG GGTGTGGCGG ACXGCTATCA 
AAAAGACCTA AGTAGCTGAC ACCGGCCGAC CCACACCGCC TGGCGATAGT 
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5701 GGACATAGCG TTGGCTACCC GTGATATTGC TGAAGAGCTT GGCGGCGAAT 
CCTGTATCGC AACCGATGGG CACTATAACG ACTTCTCGP.A CCGCCGCTTA 



5751 GGGCrCACCC CTTCCTCCTC CTTTACGGTA TCGCCGCTCC CGATTCGCAG 
CCCGACTGGC GAAGCACCAC GAAATGCCAT AGCGGCGAGG GCTAAGCGTC 



5801 CGCATCGCCT TCTATCGCCT TCTTGACGAG TTCTTCTGAG CGGGACTCTG 
GCGTAGCGGA AGATAGCGGA AGAACTGCTC AAGAAGACIC GCCCTGAGRC 



5851 G6GTTCGCAT CGATAAAATA AAAGATTTTA TTTAGTCrCC AGAAAAAGGG 
CCCAAGCGTA GCTATTTTAT TTTCTAAAAT AAATCAGAGG rCTTTTTCCC 



5901 GGGAATCAAA GACCCCACCT GT7VGGTTTGG CAAGCTAGCT TAAGTAACGC 
CCCTTACTTT CTGGGGTGGA CATCCAAACC GTTCGATCGA ATTCATTGCG 



59S1 CATTTTGCAA GGCATGGAAA' AATACATAAC T6AGAATAGA GAAGTTCAGA 
GTAAAACGTT CCGTACCTTT rTATGTATTG ACTCTrATCT CTTCAAGTCT 



6001 ' rCAAGGTCAG GAACAGATGG AACAGCTGAA TATGGGCCAA ACAGGATATC 
AGTTCCAGTC CTTGTCTACC TTGTCGACTT ATACCOGGTT TGTCCTATAG 



6051 TGTGGTAAGC AGTTCCTGCC CCGGCTCAGG GCCAAGARCA GATGGAACAG 
ACACCATTCG TCAAGGACGG GGCCGACTCC CGGTTCTTCT CTACCTTGTC 



6101 CTGAATATGG GCCAAACAGG ATATCTGTGG TAAGCACTTC CTCCCCCGGC 
GACTTATACC CGGTTTGTCC TATAGACACC ATTCGTCAAS GACGGGGCCG 



6151 TCAGGGCCAA GAACAGATGG TCCCCAGATG CGGTCCAGCC CTCAGCAGTT 
AGTCCCGGTT CTTGTCTACC AGGGGTCTAC GCCAGGTCGG GAGTCGTCAA 



6201 TCTA6AGAAC CATCAGATGT TTCCAGGGTG CCCCAAGGAC CTGAAATGAC 
AGATCTCTTG GTAGTCTACA AAGGTCCCAC GGGGTTCCTG GACTTTACTG 



■^231 CCTGTGCCTT ATTTGAACTA ACCAATCAGT TCGCTTCTCG CTTqTGTTCG 
GGACACGGAA TAAACTTGAT TGGTTAGTCA AGCGAAGAGC GAAGACAAGC 



6301 CCCGCTTCTG CTCCCCGAGC- TCAATAAAAG AGCCCACAAC CCCTCACTCG 
GCGCGAAGAC GAGGGGCTCG AGTTATTTTC TCGGGTGTTG GGGAGTGAGC* 



6351 GGGCGCCAGT CCTCCGATTG ACTGAGTCGC CCGGGTACCC GTGTATCCAA 
CCCGCGGTCA GGAGGCTAAC TGACTCAGCG GGCCCATGGG CACATAGGTT 



6401 TAAACCCTCT TGCAGTTGCA TCCGACTTGT GGTCTCGCTG TTCCTTGGGA 
ATTTGGGAGA ACGTCAACGT AGGCTGAACA CCAGAGCGAC AAGGAACCCT 



6451 GGGTCTCCTC TGAGTGATTG ACTACCCGTC AGCGGGGGTC TTTCATTCAT 
CCCAGAGGAG ACTCACTAAC TGATGGGCAG TCGCCCCCAG AAAGTAAGTA 



6501 GCAGCATGTA- TCAAAATTAA TTTGGTTTTT TTTCTTAAGT ATTTACATTA 
CGTCGTACAT AGTTTTAATT AAACCAAAAA AAAGAATTCA TAAATGTAAT 



6551 AATGGCCATA GTTGCATTAA TGAATCGGCC AACGCGCGGG GAGAGGCGGT 
TTACCG6TAT CAACGTAATT ACTTAGCCGG TTGCGCGCCC CTCTCCGCCA 



6601 TTGCGTATTG GCGCTCTTCC GCTTCCTCGC TCACTGACTC GCTGCGCTCG 
AACGCATAAC CGCGAGAAGG CGAAGGAGCG AGTGACT6AG CGACGCGAGC 
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6651 GTCGTTCGGC TGCGGCGAGC GGTATCAGCT CACTCAAAGG CGGTAATACG 
CAGCAAGCCG ACGCCGCTCG CCATAGTCGA GTGAGTTTCC GCCATTATGC 



6101 GTTATCCACA GAATCAGGGG ATAACGCAGG AAAGAACATC TGACCAAAAG 
CAATAGGTGT CTTAGTCCCC TATTGCGTCC TTTCTTC7AC ACTCGTTTTC 



6151 GCCAGCAAAA GCCCAGGAAC CGTAAAAACG CXCCGTTGCT GGCGTTTTTC 
CGGTCGTTTT CCGGTCCTTG GCATTTTTCC GGCGCAACGA CCGCAAAAAG 



6801 CAIAGGCTCC GCCCCCCTGA CGAGCATCAC AAAAATC6AC GCTCAAGTCA 
GTATCCGAGG CGGGGGGACT GCTCGTAGTG TTTTTAGCTG CGAGTTCAGT 



6851 GAGGTGGCGA AACCCGACAG GACTATAAAG ATACCACGCC TTTCCCCCTC 
CTCCACCGCT TTGGGCTGTC CTGATATTTC TArGGTCCCC AAAGGGCGAC 



6901 GAAGCTCCCT CGTGCGCTCT CCTGTTCCGA CqCTGCCGCT TACCGGATAC 
CrrCGAGGGA GCACGCGAGA GGACAAGGCT GGGACGGCGA ATGGCCTATG 



6951 CTGTCCGCCT TTCTCCCTTC GGGAAGCGTG GCGCTTTCrC ATA6CTCACG 
GACAGGCGGA AAGA6GGAA6 CCCTTCGCAC CGCGAAAGAG TATCGAGTGC 



7001 CTGTAGGTAT CTCAGTTCGG TGTAGGTCGT TCGCTCCAAG CTGGGCTGTG 
GACATCCATA GAGTCAAGCC RCATCCAGCA AGCGAGGTTC GACCC6ACAC 



•7051 TGCACGAACC CCCCGTTCAG CXCGACCGCT GCGCCTTATC CGGTAACTAT 
ACGTGCTTGG GGGGCAAGTC GGGCTGGCGA CGCGGAATAG GCCATTGATA 



"7101 CGTCTTGAGT CCAACCCGGT AAGACACGAC TTATCGCCAC TGGCAGCAGC 
GCAGAACTCA GGTTGGGCCA TTCTGrGCTG AATAGCGGTG ACCGTCGTCG 



7151 CACTGGTAAC AGGATTAGCA. GAGCGAGGTA TGTAGGCGGT GCTACAOIGT 
GTGACCATTG TCCTAATCGT CTCGCTOCAT ACATCCGCCA CGATGTCTCA 



7201 TCTTGAAGTG GTGGCCTAAC TACGGCTACA CTAGAAGAAC AGTATTTGGT 
AGAACTTCAC CACCGGATTG AXGCCGATGT GATCTTCTTG TCATAAACCA 



7251 ATCTGCGCTC TGCTGAAGCC AGTTACCTTC GGAAAAAGAG TTGGTAGCTC 
TAGACGCGAG ACGACTTCGG TCAATGGAAG CCTTTTTCTC AACCATCGAG 



7301 TTGATCCGGC AAACAAACCA CCGCTGGTAG CGGTGGTTTT TTTGTTTGCA 
AACTAGGCCG 7TTGTTTGG7 GGCGACCATC GCCACCAAAA AAACAAAC6T 



7351 AGCAGCAGAT TACOCGCAGA AAAAAAGGAT CTCAAGAAGA TCCTTTGATC 
TCGTCGTCTA ATGOGOGTCT TTtTTTCCTA GAGTTCTTCT AGGAAACTAG 



7401 TTTTCTACGG GGTCTGACGC rCAGTGGAAC GAAAACTCAC 6TTAAGGGAT 
AAAAGATGCC CCAGRCTGCG AGICACCTTG CTTTTGAGTG CAATTCCCTA 



7451 TTTGGTCATG AGATTATCAA AAAGGATCTT CAOCTAGATC CTTTTGCGGC 
AAACCAGTAC TCTAATAGTT TTTCCTAGAA GTGGATCTAG GAAAACGCCG 



7501 CGCAAATCAA TCTAAAGTAT ATAIGAGTAA ACTTGGtCTG ACAGTTACCA 
GCGTTTAGTT AGATTTCATA TAXACTCATT TGAACCAGAC TGTCAATGGT 



7551 ATCCTTAATC AGTGAGGCAC CTATCTCAGC GATCTGTCTA TTTCGTTCAT 
TACGAATTAG TCACTCCGTG GATAGAGTCG CTAGACAGAT AAA6CAAGTA 
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7601 


CCATAGTTGC 
GGTATCAACG 


CTGACTCCCC GTCGTGTAGA 
GACTGAGGGG CAGCACATCT 


TAACTACGAT ACGGGAGGGC 
ATTGATGCTA TGCCCTCCCG 


7651 


TTACCATCTG 
AATGGTAGAC 


GCCCCAGTGC TGCAATGATA 
CGGGGTCACG ACGTTACTAT 


CCGCGAGACC CACGCTCACC 
G6CGCTCTGG GTGCGAGTGG 


7701 


GGCTCCACAT 
CCGAGGTCTA 


TTATCAGCAA TAAACCAGCC 
AATAGTCGTT ATrTGGTCGG 


AGCCGGAACG GCCGAGCCCA 
TCGCCCTTCC CGGCTCGCGT 


7751 


GAAGTGGTCC 
CTTCACCAGG 


TGCAACTTTA TCCGCCTCCA 
ACGTTGAAAT AGGCGGAGGT 


TCCAGTCTAT TAATTGTTGC 
AGGrCAGATA ATTAACAACG 


7aci 


CGGGAAGCTA 
GCCCTTCGAT 


GAGTAAGTAG TTCGCCA6TT 
CTCATTCATC AAGCGGTCAA 


AATAGITTGC GCAACGTTGT 
TTATCAAACG CGTTGCAACA 


7831 


TGCCATTGCT 
ACGGTAACGA 


ACAGGCATCG TGGTGTCACG 
TGTCCGTAGC ACCACAGTGC 


CTCGTCCTTT GGTATGGCTT 
GAGCAGCAAA CCATACCGAA 


7901 


CATTCAGCTC 
GTAAGTC6AG 


CGGTTCCCAA CGATCAAGGC 
GCCAAGGGTT .GCTAGrTCCG 


GAGTTACATG ATCCCCCATG 
CTCAATGTAC TAGGGGGTAC 


7951 


TtGTGCfiAAA AAGCGGTTAG CTCCTTCGGT 
.AACACGTTTT TTCGCCAATC GAGGAAGCCA 


CXrrCCGATCG TTGTCA6AAG 
GGAGQCTAGC AACAGTCTTC 


8001 


TAAGTTGGCC GCAGTGTTAT CACTCATGGT TATGGCAGCA CTGCATAATT 
ATTCAACCGG.CGTCACAATA GTGAGTACCA ATACOGTCGT GACGTATTAA 


8051 


CTCTTACT6T 
GAGAATGACA 


CATGCCATCC GTAAGATGCT TTTCTGTGAC TGGTGA6TAC 
GTACGGTAGG CAXTCTACGA AAAGACACTG ACCACTCATG 


8101 


TCAACCAAGT 
AGTTGGTTCA 


CATTCTGAGA ATAGTCTATG 
GTAAGACTCT TATCACATAC 


CGGC^CCGA GTTGCTCTTG 
GCCGCTGGCT CAACGAGAAC 


•8151 


<:CCGGCGTCA- 
GGGCCGCAGT 


•ATACGGGATA ATACCGCGCC-ACATAGCAGA ACTTTAAAAG 
TATGCCCTAT TATGGCGCGG TGTATCGTCT TGAAATTTTC 


B2C1 


TGCTCATCAT 
ACGAGTAGTA 


TGGAAAAOGT TCTTCGGGGC GAAAACTCTC AAGGATCTTA 
ACCTTTTGCA AGAAGCCCCG CTTTTGAGAG TTCCTAGAAT 


8221 


CCGCTGTTGA GATCCAGTTC GATGTAACCC ACTCGTGCAC CCAACTGATC 
GGCGACAACT CTAGGTCAAG CTAC31TTGGG .TGAGCACGTG GGTTGACTAG 


B301 


TTCAGCATCT TTTACTTTCA CCAGCGTTTC T6GGTGAGCA AAAACAGGAA 
AAGTCGTAGA AAATGAAAGT GGTCGCAAAG ACCCACTCGT TTTTGTCCTT 


8351 


GGCAAAATGC CGCAAAAAAG GGAATAAGGG CGACAOGGAA ATGTTGAATA 
CCGTTTTACG GCGTTTTTTC CCTTATTCCC 6CTGTGCCTT TACAACTTAT 


8401 


CTCATACTCT TCCTTTTTCA ATATTArTTCA AGCATTTATC AGGGTTATTG 
GAGTATGAGA AG6AAAAAGT. TATAATAACT TCGTAAATAG TCCXAATRAC 


8451 


TCTCATGAGC GGATACATAT TTGAATGTAT TTAGAAAAAT AAACAAATAG 
AGAGTACTCG CCTATGTATA AACTTACATA AATCTTTTTA TTTGTTTATC 


8501 


GGGTTCCGOG CACATTTC . 
CCCAAGGCGC GTGTAAAG 
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1 CTGCAGCCTG AATATGGGCC AAACAGCATA TCTGTGG7AA GCAGTTCCTG 
GACGTCGGAC TTATACCCGG TTTGTCCTAT AGACACCA7T CCTCAAGGAC 



51 


CCCCGGCTCA 
GGGGCCGAGT 


GGGCTAAGAA CAGATGGAAC 
CCCGGTTCTT GTCTACCTTG 


AGCTGAATAT 
TCGACTTATA 


GGGCCAAACA 
CCCGGTTTGT 


101 


GGATATCTGT 
CCTATAGACA 


GGTAAGCAGT TCCTGCCCCG 
CCATTCGTCA AGGACGGGGC 


GCTCAGGGCC 
CGAGTCCCGG 


AAGAACAGAT 
TTCTTGTCTA 


LSI 


GGTCCCCAGA 
CCAGGGGTCT 


TGCGGTCCAG CCCTCAGCAG 
ACGCCAGGTC GGGAGTCGTC 


TTTCTAGAGA ACCATCAGAT 
AAAGATCrrCT TGGTAGTCTA 


201 


GTTTCCAGGG 
CAAAGGTCCC 


TGCXZCCAAGG ACCTGAAATG 
ACGGGGTTCC TGGACTTTAC 


ACCCTGTGCC 
TGGGACACGG 


TTATTTGAAC 
AATAAACTTG 


251 


TAACCAATCA 
ATTGGTTAGT 


GTTCGCTTCT CGCTTCTGTT 
CAAGCGAAGA GCGAAGACAA 


CGCGCGCTTC 
GCGCGCGAAG 


TGCTCCCCGA 
ACGAGGGGCT 


301 


GCTCAATAAA AGAGCCCACA ACCCCTCACT CGGGGCGCCA GTCCTCCGAT 
CGAGTTATTT TCTCGGGTGT TGGGGRGTGA GCCCCGCGGT CAGGAGGCTA 


351 


TGACTGAGTC 
ACTGACTCAG 


GCCCGGGTAC CC6TGTATCC 
CGGGCCCATG GGCACATAGG 


AATAAACCCT 
TTATTTGGGfi 


CTTGCAGTTG 
6AACGTCAAC 


401 


CATCCGACTT 
GTAGGCTGAA 


GTGGTCTCGC TGTTCCTTGG 
CACCAGAGCG ACAAGGAACC 


GAGGQTCTCX: 
CTCCCAGAGG 


TCTGAGTGAT 
AGACTCACTA 


451 


TGACTACCCG 
ACTGATGGGC 


TCAGCGGGGG TCTTTCATTT 
AG7CGCCCCC AGAAAG7AAA 


GGGGGCTCGT 
CCCXXCAGCA 


CCGGGATCGG 
GGCCCTAGCC 


501 


GAG^ACCCCTG CCCAGGGACC ACCGACCCAC CACCGGGACG 
CTCTGGGGAC GGCTCCGTGG TGGCTGGGTG GTGGCCCTCC 


CAAGCTGGCC 
GTTCGACCGG 


551 


AGCAACTTAT 
TCGTTGAATA 


CTGTGTCTGT CCGATTGTCT 
GACACAGACA GGCTAACAGA 


AGTGTCTATG. 
TCACAGATAC 


. ACTGATTTTA 
TGACTAAAAT 


601 


TGCGCCTGCG 
ACGCGGACGC 


TCGGTACTAG TTAGCTAACT 
AGCCAT6ATC AATCGATTGA 


AGCTCTGTAT 
TCGAGACATA 


CTGGCGGACC 
GACCGCCTGG 


G51 


CGTGGTGGAA CTGACGAGn CTGAACACCC GGCCGCAACC CTGGGAGACG 
GCACCACCTT GACTGCTCAA GACTTGTGGG CCGGgGTTGG GACCCTCTGC 


701 


TCCCAGGGAC 
AGGGTCCCTG 


TTTGGGGGCC GTTTT7CTGG 
AAACCCCCGG CAAAAACACC 


CCCGACCTCA 
GGGCTGGAC7 


GGAAGGGAGT 
CCTTCCCTCA 


751 


CGATGTGGAA 
GCTACACCXT 


rCCGACCCCG TCAGGATATG TGGTTCTGGT 
AGGCTGGGGC AGTCC7ATAC ACCAAGACCA 


AGGAGACGAG 
TCCTCTGCTC 


801 


AACCTAAAAC 
TTGGATTTTG 


AGTTCCCGCC TCCGTCTGAA 
TCAAGGGCGG AGGCAGACTT 


TTTTTGCTTT 
AAAAACGAAA 


CGGTTTGGAA 
GCCAAACCTT 


851 


CCGAAGCCGC GCGTCTTGXC TGCTGCACCA TCGTTCTGTG TTGTCTCTGT 
GGCnCGGCG CGCAGAACAG ACGACGTCGT AGCAA6ACAC AACAGAGACA 


901 


CTGACTGTCT 
GACTGACACA 


TTCrGTATTT GTCTGAAAAT TAGCGCCAGA 
AAGACATAAA CAGACTTTTA ATCCCGGTCT 


CTGTTACCAC 
GACAATGGTG 
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951 TCCCTTAAGT TTGACCTTAG GTAACTGGV^ AGATGTCGAG CGGCTCGCTC 
AGGGAATTCA AACTGGAATC CATTGACCTT TCTACAGCTC GCCGAGCGAG 



1001 ACAACCACTC GGTAGATGTC AAGAAGAGAC GTTGGGTTAC CTTCTGCTCT 
TGTTGGTCAG CCATCTACAG TTCTTCTCTG CAACCCAATG GAAGACGAGA 



1051 GCAGAATGGC CAACCTTTAA CGTCGGATGG CCGC6AGACG GCACCTTTAA 
CGTCrXACCG GTTGGAAATT .GCAGCCTACC GGCGCTCTGC CGTGGAAATT 



1101 CCGAGACCTC ATCACCCAGG TTAAGATCAA GGTCTTTTCA CCTGGCCCGC 
GGCTCTGGAG TAGTGGGTCC AATTCTAGTT CCAGAAAAGT GGACCGGGCG 



1131 ATGGACACCC AGACCAGGTC CCCTACATCG TGACCTGGGA AGCCTTGGCT 
TACCTGTGGG 'TCTGGTCCAG GGGATGTAGC ACTGGACCCT TCGGAACCGA 



1201 TTTGACCCCC CTCCCTGGGT CARGCCCTTT GTACRCCCTA AGCCTCCGCC 
AAACTGGGGG GAGGGACCCA GTTCGGGAAA CATGTGGGAT TCGGAGGCQG 



1251 TCCTCTTCCT CCATCCGCCC CGTCTCTCCC TCCTTGAACCT CCTCGTTCGA 
AGGAGAAGGA GGTAGGCGGG GCAGAGAGGG GGAACTTGGA GGAGCAAGCT 



1301 OCCCGCCTCG ATCCTCCCTT TATCCAGCCC TCACTCCTTC TCTAGGCGCC 
GGGGCGGAGC TAGGAGGGAA ATAGGfCGGG AGTGAGGAAG AGATCCGCGG 



1351 GGCCGCTCTA GCCCATTAAT ACGACTCACT ATAGGGCGAT TCGAATCAGG 
CCGGCGAGAT CGGGTAATTA TGCTGAGTGA_^TATCXX^^^ 



14 01 CCTTGGCGCG CXGGATCCTT AATTAAGCGC AATTGGGAGG TGGCGGTAGC 
GGAACCGCCC GGCCTAGGAA TTRATTCGCG TTAACCCrOC ACCCCCATCG 



1451 CTCGAGATGG GCCTGATTAC .GGArTCACTG GCCGTCGTTT TACAACGTCG 
GAGCTCTACC CGCACTAATG CCTAAGTGAC CGGCAGCAAA ATGTTGCAGC 



ISO! TGACTGGGAA AACCCTGGCG rTACCCAACT TAATCGCCTT GCAGCACATC 
ACTGACCCTT TTGGGACCGC AATGGGTTGA ATTAGCGGAA CGTCGTGTAG 



1551 CCCCTTTCGC CAGCTGGCGT AATAGCGAAG AGGCCCGCAC CGATCGCCCT 
GGGGAAAGCG GTCGACCGCA TXATCGCTTC TCCGGGCGTG GCTAGCGGGA 



1601 TCCCAACAGT TACGCAGCCT GAATGGC6AA TGGCGCTrTG CCTGGTTTCC 
AGGGTTGTCR ATGCGTCGGA CTTACCGCTT ACCGCGAAAC GGACCAAAGG 



1651 GGCACCAGAA GCGGTGCCGG AAAGCTGGCT GGAGTGCGAT CTTCCTGAGG 
CCGTGGTCTT CGCCACGGCC TTTCGACCGA CCTCACGCTA GAAGGACTCC 



1701 CCGATACTGT CGTCGTCCCC TCAAACTGGC AGATGCACXK? TTACGATGCG 
GGCTATGACA GCAGCAGGGG AGTTTGACCG TCTACGTGCC AATGCTACGC 



1751 OCCATCTACA CXAACGTGAC CTATCCCATT ACGGTCAATC CGCCGTTTGT 
GGGTAGAT6T GGTTGCACTG GATAGGGTAA TGCCAGTTAG GCGGCAAACA 



1801 TCCCACGGAG AATCCGACGG GrTGTTACTC GCrCACAXTT AATGTTGATG 
AGGGTGCCTC TTAGGCTGCC CAACAATGAG CGAGTGTAAA TTACAACTAC 



1851 AAAGCTGGCT ACAGGAAGGC CAGACGCGAA TTATTTTTGA TGGCGTTAAC 
TTTCGACCGA TGTCCTTCCG GTCTGCGCTT AATAAAAACT ACCCCAATTC 
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1001 TCGGCGTTTC ATCTGTGGTG CAACGGCCGC TGGGTCGGTT ACGGCCAGCA 
AGCCGCAAAG TAGACflCCAC GTTGCCCGCG ACCCAGCCAfl TGCCGGTCCT 



1951 CAGTCGTTTG CCGTCTGAAT TTGACCTGAG CGCATTTTTA CGCGCCGGAG 
GTCAGCAAAC GGCAGACTTA AACTGGACTC GCGTAAAAAT GCGCGGCCTC 



2001 AAAACCGCCr CGCGGTGATG GTGCTGCGCT GGAGTGAOSC CAGTTATCTG 
TTTTGGCGGA GCGCCACTAC CACGACGCGA CCTCACTGCC GTCAATAGAC 



2051 GAAGATCAGG ATATGTGGCG GATGAGCGGC ATTTTCCCTG ACGTCTCGTT 
CTTCTAGTCC TATACACCGC CTACTCGCCG TAAAAGGC^C TGCAGAGCAA 



2101 GCTGCATAAA CCGACTACAC AAATCAGCGA rTTCCATGTT GCCACTCGCT 
CGACGTATTT GGCTGATGTG TTTAGTCGCT AAAGGTACAA CGGTGAGCGA 



2151 TTAATGATGA TTTCAGCCGC GCTGTACTGG AGGCTGAAGT TCAGATGTGC 
AATTACTACr AAAGTCGGCG CGACATGACC TCCGACXTCA AGTCTACACG 



2201 GGCGAGTTGC GTGACTACCT ACGGGTAACA GTTTCTTTAT GGCAGGGTGA 
CCGCTCAACG CACTGATGGA TGCCCATTGT CAAAGAAATA CCGTCCCACT 



2251 AACGCAGGTC GCCAGCGGCA CCGCGCCTTT CGGCGGTGAA ATTATCGATG 
TTGCGTCCAG CGGTCGCCGT GGCGCGGAAA GCCGCCACTT TAATAGCTAC 



2301 AGCGTGGTGG TTATGCCGAT CGCGTCACAC TACGTCTGAA CGTCGAAAAC 
TCGCACCACC. AATACGGCTA GCGCAGTGTG ATGCAGACTT GCAGCTTTTG 



2351 CCGAAACTGr GGAGCGCCGA AATCCCGAAT CTCTATCGTG CGGTGGTTGA 
GGCTTTGACR CCTCGCGGCT TTAGGGCTTA 6AGATAGCAC GCCACCAACT 



2401 ACTGCACACC GCC6ACGGCA CGCTGATTGA AGCAGAAGCC TGCGATCTCG 
TGAC6TGTGG CGGCTGCCGT GCGACTAACT TCGTCTTCGG ACGCTACAGC 



2451 GTTTCCGCGA GGTGCGGATT GAAAAtGGTC TGCTGCTGCT GAACGGCAAG 
CAAAGGCGCT CCACGCCTAA CTTTTACCAG ACGACGACGA CTTGCCGTTC 



2501 CCGTTGCTGA TTCGAGGCGT TAACCGTCAC GAGCATCATC CTCTGCATGG 
GGCAACGACT AAGCTCCGCA ATTGGCAGTG CTCGTAGTAG GAGACGTACC 



2551 TCAGGTCATG GATGAGCAGA CGATGGTGCA GGATATCCTG CTGATGAAGC 
AGTCCAGTAC CTACTCGTCX GCTACCACGT CCTATAGGAC GACTACTTCG 



2 SOI AGAACAACTT TAACCCCCTG CCCTCTTCGC ATTATCCGAA CCATCCGCTC 
TCTTGTTGAA ATTGCGGCAC GCGACAAGCG XAATAGGCTT GGTAGGCGAC 



2S51 TGGTACACGC TGTGCGACCG CTACGGCCTG TATGTGGTGG ATGAAGCCAA 
ACTATGTGCG ACACGCTGGC GATGCCGGAC ATACAOCACC TACTTCGGTT 



2701 TATTGAAAOC CACGGCATGG TGCCAATGAA TCGTCTGACC GATGATCOGC 
ATAACTTTGG GTGCCGTACC ACGGTTACTT AGCAGACTGG CTACTAGGCG 



2751 GCTGGCTAOC GGCGATGAGC GAACGCGTAA CGCGAATGGT GCAGCGCGAT 
CGACCGATGG CCGCTACTCG CTTGCGCATT GCGCTrACCA CGTCGCGCTA 



2B01 CGTAATCACC CGAGTGTGAT CATCTGGTCG CTGGGGRATG AATCAGGCCA 
GCATTAGTGG GCTCACACTA G7AGACCAGC GACCXCTTAC TTA6TCCGGT 
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2851 CGGCGCTAAT CACGACGCGC TGTATCGCTG GATCAAATCT GTCGATCCTT 
GCCGCGATTA GTGCTGCGCG ACATAGCGAC CTAGTTTA'3A CAGCTAGGAA 



2901 CCCGCCCGGT GCAGTATGAA GGCGGCGGAG CCGACACCAC -GGCCACCGAT 
GGCCGGGCCA CGTCATACTT CCGCCGCCTC GCCTGTGGrG CCGCTGGCTA 



2951 ATTATTTGCC CGATGTACGC GCCCGTGCAT GAAGACCAGC CCTTCCCGGC 
TAATAAACGG GCTACATGCG CGCGCACCTA CTTCTGGTCG G6AAGGGCCG 



3001 TGTGCCGAAA TGGTCCATCA AAAAATGGCT TTCGCTACCT GGAGAGACGC 
ACACGGCTrr ACCAGGTAGT TTTTTACCGA AAGCGATG3A CCTCTCTGCG 



3051 GCCCGCTGAT CCTTTGCGAA TACGCXCACG CGATGGGTAA CAGTCTTCGC 
CGGGCGACTA GG3VAACGCTT ATGCGGGTGC GCTACCCATT GTCAGAACCG 



3101 GGTTTCGCTA- AATACTGGCA GGCGTTTCGT CAGTATCCCC GTTTACAGGG 
CCAAAGCGAT TTATGACCGT CCGCAAAGCA GTCATAGGC3G CAAATGTCCC 



3151 CGGCTTCGTC TGGGACTGGG TGGATCAGTC GCTGATTAAA TATGATGAAA 
GCCGAAGCAG ACCCTGACCC ACCTAGTCA6 CGACTAATTT ATACTACTTT 



3201 ACGGCAACCC GTGGTCGGCT TACGGCGGTG ArTTTGCCEA TACGCCGAAC 
TGCCGTTGGG CACCAGCXGA ATGCCGCCAC TAAAACCQCT ATGCGGCTTG 



3251 GATCGCCAGT TCTGTATGAA CGGTCTGGTC TTTGCCGACC GCACGCCGCA 
CTAGCGGTCA AGACATACTT GCCAGACCAG AAACGGCTGC CGTGCGGCGT 



3301 TCCAGCGCTG ACGGAAGCAA AACACCAGCA GCAGTTTTTC CAGTTCCGTT 
AGGTCGCGAC TGCCTTCGTT TTGTGGTCGT CGTCAAAAAG GTC3\AGGCAA 



3351 TATCCGGGCA AACCATCGAA GTGACCAGCG " AATACCTGXT CCGTCATAGC 
ATAGGCCCGT TTGGTAGCTT CACTGGTCGC TTATGGAC3\A GGCAGTATCG 



3401 GATAACGAGC TCCTGCACTG 6ATGGTGGCG CTGGATGGXA AGCCGCTGGC 
CTATTGCTCG AGGACGTGAC CTACCACCGC GACCTACCAT TCGGCGACCG 



3451 AAGCGGTGAA GTGCCTCTGG ATGTCGCTCC ACAAGGTAAA CAGTTGATTG 
TTCGCCACTT CACGGAGACC TACAGCGAGG TGTTCCATTT GTCAACTAAC 



3501 AACTGCCTGA ACTACCGCAG CCGGAGAGCG CCGGGCAACT CTGGCTCACA 
TTGACGGACT TGATGGCGTC GGCCTCTCGC GGCCCGTTGA GACCGAGTGT 



3551 GTACGCGTAG TGCAACCGAA CCCGACCGCA tGGTCAGAAG CCGGGCACAT 
CATGCGCATC ACGTTGGCTT GCGCTGGCGT ACCAGTCTTC GGCCCGTGTA 



3601 CAGCGCCTGG CAGCAGTGGC GTCTGGCGGA AAACCTCAGT* GTGACGCTCC 
GTCGCGGACC GTCGTCACCG CAGACXGCCT TTTGGAGTCA CACTGCGAGG 



3651 CCGCCGCGTC CCACGCCATC CCGCATCTGA CCACCAGOGA AATGGATTTT 
GGCGGCGGAG GGTGCGGTAG GGCGTAGACT GGTGGTCGCT rTACCTAAAA 



.3701 TGCATCGAGC TGGGTAATAA GCGTTGGCAA TTTAACCGCC A6TCAGGCTT 
ACGTAGCTCG ACOCATTATT CGCAACCGTT AAATTGGCGG TCAGTCCGAA 



3751 TCTTTCACAG ATGTGGATTG GCGATAAAAA ACAACTGCTG ACGCCGCTGC 
AGAAAGTGTC TACACCTAAC CGCTATTTTT TGTrGAOGAC TGCGGCGACG 
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3801 


GCGATCAGTT CACCCGTGTC GATAGATCTG AACAGAAACT CATTTCCGAA 
CGCTAGTCAA GTGGGCACAG CTATCTAGAC TTGTCIT7GA GTAAAGGCTT 




j851 


GAAGACCTAG TCGACCATCA TCATCATCAT CACCGGTAAT AATAGGTAGA 
CTTCTGGATC AGCTGGTAGT AGTAGTAGTA GTGGCCATTA TTATCCATCT 




3901 


TAAGTGACTG ATTAGATGCA TTTCGACTA6 ATCCCtCGAC CAATTCCGGT 
ATTCACTGAC TAATCTACGT AAAGCTGATC TAGGGAGCTG GTTAAGGCCA 




3951 


TATTTTCCAC CATATTGCCG TCTTrXGGCA ATGTGAGGGC CCGGAAACCT 
ATAAAAGGTG GTATAACGCC AGAAAACCGT TACACTCCCG GGCCTTTGGA 




4001 


GGCCCTGTCT TCTTGACGAG CATTCCTAGG GGTCTTTCCC CTCTCGCCAA 
CCGGGACAGA AGAACTGCTC GTAAGGATCC CCAGAAAGGG GAGAGCGGTT 




4051 


AGGAATGCAA GGTCTGTTGA ATGTCGTGAA GGAAGCAGTT CCTCTGGAAG 
TCCTTACGTT CCAGACAACT TACAGCACTT CCTTCGTCAA GGAGACCTTC 




4101 


CTTCTTGAAG ACAAACAACG TCTGTAGCGA CCCTTTGCAG GCAGCGGAAC 
GAAGAACTTC TGTTTGTTGC AGACATCGCT GGGAAACGTC CGTOGCCTTG 




4151 


CCCCCACCTC GCGACAGGTG CCTCT6CGGC CAAAAGCCAC GTGTATAACA 
GGGGGTGGAC CGCTGTCCAC CGACACGCCG GTTTTCGGTG CACATATTCT 




4201 


TACACCTGCA AAGGCGGCAC AACCCCAGTG CCACGTTGTG AGTTGGATAG* 
ATGTGGACGT TTCCGCCGTG TTGGGGTCAC GGTGCAACAC TCAACCTATC 




4251 


TTGTGGAAAG AGTCAAATGG CTCTCCTCAA GCGTATICAA CAAGGGGCTG 
AACACCTTTC TCAGTTTACC GAGAGGAGTT CGCATAAGTT GTTCCCCGAC 




4301 


AAGGATGCCC AGAAGGTACC CCATTGTATG GGATCTGATC TGGGGCCTCG 
TTCCTACGGG TCTTCCATGG GGTAACATAC CCTAGACTAG ACCCCGGAGC 




4351 


GTGCACATGC TTTACaVTGTG TTTAGTCGAG GTTAAAAAAC GTCTACCCCC 
CACGTGTACC AAATGTACAC AAATCACCTC CAATTTTTTG CAGATCCCGG 




4401 


CCCGAACCAC GGGGACGTGG TTTTCCTTTG AAAAACACGA TGATAATACC 
GGGCTTGGTG CCCCTGCACC AAAAGGAAAC TTTTTGTGCT ACTATTATGG 




44S1 


ATGAAAAAGC CTGAACTCAC CGCGACGTCT GTCGAGAAGT TTCTGATCGA 
TACTTTTTCG-GACTTGAGTG GCGCTGCAGA CAGCTCTTCA AAGACTAGCT 




4501 


AAACTTCGAC AGCGTCTCCG ACCT6ATGCA GCTCTCGGAG GCCGAAGAAT 
TTTCAAGCTG TCGCAGAGGC TGGACTAC6T CGAGAGCCTC CCGCTTCTTA 




4551 


CTCGTGCTTT CAGCTTCGAT GTAGGAGGGC GTGGATATGT CCTGCGGGTA 
GAGCACGAAA GTCGAAGCTA CATCCTCCCG CACCTATACA GGACGCCCAT 




4601 


AATAGCTGCG CCGATGGTTT CTACAAAGAT CGTTATGTTT ATC6GCACTT 
TTATCGACGC GGCTACCAAA GATGTTTCTA GCAATACAAA TAGCCGTGAA 




4651 


TGCATCGGCC GCGCTCCCGA TTCCGGAAGT GCTTGACATT GGGGAArtTA 
AC6TAGCCGG CGCGAGGGCT AAGGCCTTCA CGAACTGTAA CCCCTTAAAT 




4701 


GCGAGAGCCT GACCTATTGC ATCTCCCGCC GTGCACAGGG TGTCACG1T6 
CGCTCTCGGA CTGGATAACG TAGAGGGCGG CACGTGTCCC ACAGTGCAAC 
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4751 CAAGACCTGC CTCAAACCGA ACrGCCCGCT GTTCTGCAGC CGGTCGCGGA 
GTTCTGGACG GACTTTGGCT TGACGGGCGA CAAGACGTCG GCCAGCGCCT 



4801 GGCCATGGAT GCGATCGCTG CGGCCGATCT TAGCCAGACG AGCGGGTTCG 
CCGGTACCTA CGCTAGCGAC GCCGGCTAGA ATCGGTCT6C TCGCCCAAGC 



4851 GCCCATTCGG ACCGCAAGGA ATCGGTCAAT ACACTACATG GCGTGATTTC 
CGGGTAAGCC TGGCGTTCCT TAGCCA6TTA TGTGATGTAC CGCACTAAAG 



4 901 ATATGCGCGA TTGCTGATCC CCATGTGTAT CACTGGCAAA CTGTGATGGA 
TATACGCGCT AACGACTAGG GGTACACATA GTGACCGTTT. GACACTACCT' 



4-^5L . CGACACCGTC AGTGCGTCCG TCGCGCAGGC TCTCGATGAG CTGATGCTTT 
GCTGTGGCAG TCACGCAGGC AGCGCGTCCG AGAGCTACTC GACTACGAAA 



5001 GGGCCGAGGA CTGCCCCGAA GTCCGGCACC TCGTGCACGC GGATTTCGGC 
CCCGGCTCCT GACGGGGCTT CAGGCCGTGG AGCACGTGCG CCTAAAGCCG 



5051 TCCAACAATG TCCTGACGGA CAATGGCCGC ATAACAGCGG TCATTGACTG 
AGGTTGTTAC AGGACTGCCT GTTACCGGCG TATTGTCGCC AGTAACTGAC 



5101 GAGCGAGGCG ATGTTCGGGG ATTCQCAATA CGAGGTCGCC AACATCTTCT 
CTCGCTCCGC TACAAGCCCC TAAGQGTTAT GCTCCAGCGC TTGTAGAAGA 



5151 TCTGGAGGCC GTGGTTGGCT TGTATGGAGC AGCAGACX3CG CTACTTCGAG 
. .AGACCTCqGG..CA5CAACCGA_ACATACCT TCGTCTGCCC GATGAAGCTC 



5201 CGGAGGCATC CCGAGCTTGC AGGATCGCCG CGGCTCCGGG CGTATATGCT ' 
GCCTCCGTAC GCCTCGAACG TCCTAGCGGC GCCGAGGCCC GCATATACGA 



5251 CCGCATTCCT CTTGACCAAC TCTATCAGAG CTTG6TTGAC GGCAATTTCX3 
GGCGTAACCA GAACTGSTTG AGATAGTCTC GAACCAACTG CXGTTAAAGC 



5301 ATGATGCAGC TTGGGCGCAG GCTCGATGC6 ACGCAATCGT CCGATCCGGA 
TACTACGTCG AACCCGCGTC CCAGCTACGC TGCGTTAGCA GGCTAGGCCT 



5351 GCCGGGACTG TCGGGCGTAC ACAAATCGCC CGCAGAAGCG CGGCCGTCTG 
CGGCXrCTGAC AGCCCGCATG TGTTTAGCGG, GCGTCTTCGC GCCGGCAGAC 



5401 GACCGATGGC TGTCTAGAAG TACTCGCCGA TAGTGGAAAC CGACGCCCCA 

CTGGCTACCG ACACATCTTC ATGAGCGGCT ATCACXTTTG GCTGCGGGGT , 



5451 GCACTCGTCC GAGGGCAAAG GAATAGAGTA GATGCCGACC GGGATCTATC 
CGTGAGCAGG CTCCCGTTTC CTrATCTCAT CTAOGGCTGG CCCTAGATAG 



5501 GATAAAATAA AAGATTTTAT TTAGTGTCCA GAAAAAGGGG GGAATGAAAG 
CTATTTTATT TTCTAAAATA AATCACAGGT CTTTTTCCCC CCTTACTTTC 



5551 ACCCCACCTG TAGGTTTGGC AAGCTACCTT AAGTAACGCC ATTTTGCAAG 
TGGGGTGGAC ATCCAAACCG TTCGATCCAA TTCATTGCGG TAAAACGTTC 



5601 GCATGGAAAA ATACATAACT GAGAATAGAG AAGTTCAGAT CAAGGTCAGG 
CGTACCTTTT TATGTATTGA CTCTTATCTC TTCAAGTCTA GTTCCAGTCC 



5651 AACAGATGGA ACAGCTGAAT ATGGGCCAAA CAGGATATCT GJGGTAAGCA 
TTGTCTACC? TGTCGACTTA TACXCGGTTT GTCCTATAGA CACCATTCGT 
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5701 GTTCCTGCCC CGGCTCAGGG CCRACAACAG ATGGAACACC TGAATATGCG 
CAAGGACGGG GCC6AGTCCC GGTTCTTGTC TACCTTGTCG ACTTATACCC 



5751 CCAAACAGGA TATCTGTGGT AAGCAGTTCC TGCCCCGGCT CAGGGCCAAG 
. GGTTTGTCCT ATAGACACCA TTCGTCAAGG ACGGGGCCGA GTCCCGGTTC 



3801 AACAGATGGT CCCCAGATGC GGTCCAGCCC TCAGCAGTTT CTAGAGAACC 
TTGTCTACCA GGGGTCTACG CCAGGTCGGG AGTCGTCAAA GATCTCTTGG 



5851 ATCAGATGTT TCCAGGGTGC CCCAAGGACC TGAAATGACC CTGTGCCTTA 
TAGTCTACAA AGGTCCCACG GGGTTCCTGG ACTTTACTGG GACACGGAAT 



5901 TTTGAACTAA CCAATCAGTT CGCTTCTCGC TTCTGTTCGC GCGCTTCTGC 
AAACTTGATT GGTTAGTCAA GCGAAGAGCG AAGACAAGCG CGCGAAGACG 



5951 TCCCCGAGCT CAATAAAAGA GCCCACAACC CCTCACTCGG GGCGCCAGTC 
AGGGGCTCGA GTTATTTTCT CGGGTGTTGG GGAGTGRGCC CCGCGGTCAG 



6001 CTCCGATTGA CTGAGTCGCC CGGGTACCCG TGTATCCAAT AAACCCTCTT 
GAGGCTAACT GACTCAGCGG GCCCATGGGC ACATAGGTTA T7TGGGAGAA 



6051 GCAGTTGCAT CCGACTTGTG GTCTCGCTGT TCCTTGGGAG GGTCTCCTCT 
CGTCAACGTA GGCTGAACAC CAGAGCGACA AGGAACCCTC CCAGAQGAGA 



6101 GAGTGATTGA CTACCXGTCA GCGGGG6TCT TTCATTCATG CA6CATGTAT 
CTCACTAACT GATGGGCAGT CGCCCCCAGA AAGTAAGTAC GTCGTACATA 



6151 CAAAATTAAT TTGCTTTTTT TTCTTAAGTA TTTACATTAA ATGGCCATAC 
GTTTTAATTA AACCAAAAAA AAGAATTCAT AAATGTAATT TACCGGTATC 



6201 TTGCArTAAT GAATCGGCCA ACGCGCGGGG AGAGGCGGrT TGCGTATTGG 
AACGTAATTA CTTAGCCGGT TGCGCGCCCC TCTCCGCCAA ACGCATAACC 



6251 CGCrCTTCCG CTTCCTCGCT CACTGACTCG CTGCGCTCGG TCGTTCGGCT 
GCGAGAAGGC GAAGGAGCGA GTGACTGAGC GACGCGAGCX: AGCAAGCCGA 



6301 GCGGCGAGCG GTCATCAGCTC ACTCAAAGGC GGTAATACGG TTATCCACAG 
CGCCGCTCGC CATAGTCGAG TGAGTTTCCG CCATTATGCX AATAGGTGTC 



6351 AATCAGGGGA TAACGCAGGA AAGAACATGT GAGCAAAA3G CCAGCAAAAG 
TTAGTCCCCT ATTGCGTCCT TTCTTGTACA CTCGTTTTOC GGTCGTTTTC 



6401 GCCAGGAACC GTAAAAAGGC CGCGTTGCTG GCGTITTTCX ATAGGCICCG 
CGGTCCTTGG CATTTTTOCG GCGCAACGAC CGCAAAAAGG TATCCGAGGC 



6451 CCCCCCTGAC GAGCATCACA AAAATCGACC CTCAAGTCAG AGGT6GCGAA 
GGGGGGACTG CTCGTAGTGT rXTTAGCTGC GAGTICAGTC TCCACCGCTT 



6501 ACCCGACAGG ACTATAAAGA TACCAGGCGT TTCCCCCrGG AAGCTCCCTC 
TGGGCTGTCq TGATATTTCT ATGGTCCGCA AAGGGGGACC TTCGAGGGAG 



6551 GTGCGCTCTC CTGTTCCGAC CCTGCCGCTT ACCGGATACC TGTCCGCCTT 
CACGCGAGAG GACAAGGCTG GGACGGCGAA TGGCCTATGG ACAGGCGGAA 



6601 TCTCCCTTCG GGAAGCGTGG CGCTTTCTCA TAGCTCAOGC TGTAGGTATC 
AGAGGGAAGC CCTTCGCACC GOyVAAGAGT ATCGAGTGCG ACATCXATAG 
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5651 TCAGTTCGGT GTAGGTCGTr CGCTCCAAGC TCGGCTGIGT GCACGAACCC 
AGTCAAGCCA CATCCACCAA GCGAGGTTCG ACCCGACACA CGTGCTTGGG 



5701 CCCX3TTCAGC CCGACCGCTG CGCCTTATCC CGTAACTATC GTCTTGAGTC 
GGGCAAGTCG GGCTGGCGAC GCGGAATAGG CCATTGATAG CAGAACTCAG 



5751 CAACCCGCTA AGACACGACT TATCCCCACT GGCAGCAGCC ACTGGTAACA 
GTTGGGCCAT TCTGTGCTGA ATAGCGGTGA CCGTCGTCGG TGACCArTGT 



5B01 GGATTAGCAG AGCGAGGTAT GTAGGCGGTG CTACAGAGTT CTTGAAGTGG 
CCTAATCGTC TCGCTCCATA CATCCGCCAC GATGTCTC3\A GAACTTCACC 



6851 TGGCCTAACT ACGGCTACAC TAGAAGAACA CTATTTGGTA TCTGCGCTCT 
ACCGGATTGA TGCCGATGTG ATCTTCTTCT CATAAACCAT AGACGCGAGA 



6901 GCTGAAGCCA GTTACCTTCG GAAAAAGAGT TGGTAGCTC7 TGATCCGGCA 
CGACTTCGGT CAATGGAAGC CTTTTTCTCA ACCATCGAGA ACTAGGCCGT 



6951 AACAAACCAC CGCTGGTA6C GGTGGTTTTT TTGTTTGCftA GCAGCAGATT 
TTGTTTGGTG GCGACCATCQ CCACOUIAAA AACAAACGTT CGTCGTCTAA 



7001 ACGCGCAGAA AAAAAGGATC TCAAGAAGAT CCTTTGATCT TTTCTACGGG 
TGCGCGTCTT TTTTTCCTAG AGTTCTTCTA GGAAACTAGA AAAGATGCCC 



7051 GTCTGACGCT CAGTGGAACG AAAACTCACG TTAAGGGATT TTCGTCATGA 
CAGACTGCGA GTCACCTTGC TTTTGAGTGC AATTCCCTAA AACCAGTACT 



7101 GATTATCAAA AAGGATCTTC ACCTAGATCC TTTTAAATTA AAAATGAAGT 
CTAATAGTTT TTCCTAGAAG TGGATCTAGG AAAATTTAAT TTrTACTTCA 



7151 TTGCGGCCGC AAATCAATCT AAA6TATATA TGAGTAAACT TGGTCTGACA 
AACGCCGGCG TTTAGTTAGA TTTCATATAT ACTCATTTGA ACCAGACTGT 



7201 GTTACCAATG CTTAATCAGT GAGGCACCTA TCTCAGCGAT CTGTCTATTT 
CAATGGT7AC GAATTAGTCA CTCCGTGGAT AGAGTCGCTA GACAGATAAA 



7251 CGTTCATCCA TACrPGCCTG ACTCCCCGTC GTGTAGATAA CTACGATACG 
GCAAGTAGGT ATCAACGGAC TGAGGGGCAG CACATCTATT GATGCTATGC 



7 301 GGAGGGCTTA CCATCTGGCC CCAGTGCTGC AATGATACCG CGAGAGCCAC 
CCTCCCGAAT GGTAGACCGG GGICACGACG TTACTATGGC GCTCTGGGTG 



7351 GCTCACCGGC TCCAGATTTA TCAGCAAXAA ACCAGCCAGC CGGAAGGGCC 
CGAGTGGCCG AGGTCTAAAT AGTCGTTATT TGGTCGGTCG GCCTTCCCGG 



7401 GAGCGCAGAA GTGGTCCTGC AACTTTATCC GCCTCX:ATCC AGTCTATTAA 
CTCGCGTCTT CACCAGGACG TTGAAATAGG CGGAGGTAGG TCAGATAATT 



7451 TTGTTGCCGG GAAGCTAGAG TAAGTAGTTC GCCAGTTAAT AGTTTGCGCA* 
AACAACGGCC CTTCGATCTC ATTCATCAAG CGGTCAATTA TCAAACGCGT 



7501 ACGTTGTTGC CATTGCTACA GGCATCGTGG TGTCACGCTC GTCGTTTGGT 
TGCAACAACG GTAACGATGT CCGTAGCACC ACAGT6CGAG CACCAAACCA 



7551 ATGGCTTCAT TCAGCTCCGC TTCCCAACGA TCAAGGCGAG TTACATGATC 
TACCGAAGTA AGTCGAGGCC AAGGGTTGCT AGTTCCGCTC AATGTACTAG 
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7 601 CCCCATGTTG TGCAAAAAAG CGGTTAGCTC CTTCGGTCCT CCCATCGTTG 
GGGGTACAAC ACGTTTTTTC GCCAATCGRC GAAGCCAGGA GGCTAGCAAC 



7 651 TCAGAAGTAA GTTGGCCGCA GTGTTATCAC TCATGGTTAT GGCAGCACTG 
AGTCTTCAn: CAACCGGCGT CACAATAGTG AGTACCAATA CCGTCGTGAC 



7701 CATAATTCTC TTACTGTCAT GCCATCCGTA AGA7GCTTTT CTGTGACTGG 
GTATTAAGAG AATGACAGTA CGGTAGGCAT TCTACGAAPA GAGACTGACC 



7751 TGAGTACTCA ACCAAGTCAT TCTGAGAATA GTGTATGCGG CGACCGAGTT 
ACTCATGAGT TGGTTCAGTA A6ACTCTTAT CACATACGCC GCTGGCTCAA 



7901 GCTCTTGCCC GGCGTCAATA CGGGATAATA CCGCX3CCRCA TAGCAGAACT 
CGAGAACGGG CCGCAGTTAT GCCCTATTAT GGCGCGC3TGT ATCGTCTTGA 



7851 TTAAAAGTGC TCATCATTGG AAAACGTTCT TCGGGGCGAA AACTCTCAAG 
AATTTTCRCG AGTAGTAACC TTTTGCAAGA AGCCCOSCTT TTGAGAGTTC 



7901 GATCTTACCG CTGTTGAGAT CCAGTTCGAT GTAACCCACT CGTGCACCCA 
CTAGAATGGC GACAACTCTA GQTCAAGCTA CATTGGGTGA GCACGTGGGT 



7951 ACTGATCTTC AGCATCTTTT ACTTTCACCA GCGTTTCTGG- GTGAGCAAiVA 
TGACTA6AAG TCGTAGAAAA TGAAAGTGGT CGCAAftGACC CACTCGTTTT 



8O01 ACAGGAAGGC AAAATGCCGC AAAAAAGGGA ATAAGGGCGA CACGGAAATG 
TGTCCTTCCG TTTTACGGCG TTTTTTCCCT TATTCCCGCT GTGCCTTTAC 



8051 TTGAATACTC ATACTCrTCC TTTTTCAATA TTATTGAAGC ATTTATCAGG 
AACTTATGAG TATGAGAAGG AAAAAGTTAT AATaACTTCG TAAATAGTCC 



8101 GTTATrGTCT CATGAGCGGA TACATATTTG AATGTATTTA GAAAAATAAA 
CAATAACAGA GTACTCGCCT ATGTATAAAC TTACATAAAT CTTTTTATTT 



8151 CAAATAGGGG TTCCGCGCAC ATTTC 
GTTTATCCCC AAGGCGCGTG TAAAG 
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I 


CTGCAGCCTG 
GACGTCGGAC 


AATATGGGCC 
TTATACCCGG 


AAACAGGATA 
TTTGTCCTAT 


TCTGTGGTAA 
AGACACCATT 


GCAGTTCCTG 
CGTCAAGGAC 


f 


51 


CCCCGGCTCA 
GGGGCCGAGT 


GGGCCAAGAA 
CCCGGTTCTT 


CAGATGGAAC 
GTCTACCTTG 


AGCTGAATAT 
TCGACTTATA 


GGGCCAAACA 
CCCGGTTTGT 




101 


GGATATCTGT 
CCTATAGACA 


GGTAAGCAGT 
CCATTCGTCA 


TCCTGCCCCG 
AGGACGGGGC 


GCTCAGGGCC 
CGAGTCCCGG 


AAGAACAGAT 
TTCTTGTCTA 




151 


GGTCCCCAGA 
CCAGGGGTCT 


TGCGGTCCAG 
ACGCCAGGTC 


CCCTCAGCAG 
GGGAGTCGTC 


TTTCTAGAGA 
AAAGATCTCT 


ACCATCAGAT 
TGGTAGTCTA 




201 


GTTTCCAGGG 
CAAAGGTCCC 


TGCCCCAAGG 
ACGGGGTTCC 


ACCTGAAATG 
rCCACTTTAC 


ACCCTGTGCC 
TGGGACACG6 


TTATTTGAAC 
AATAAACTTG 




251 


TAACCAATCA 
ATTGGTTAGT 


GTTCGCTTCT 
CAAGCGAAGA 


CGCTTCTGTT 
GCGAAGACAA 


CGCGCGCTTC 
GCGCGCGAAG 


TGCTCCCCGA 
ACGAGGGGCT 




301 


GCTCAATAAA 
CGAGTTATTT 


AGAGCCCACA 
TCTCGGGTGT 


ACCCCTCACT 
TGGGGAGTGA 


CGGGGCGCCA 
GCCCCGCGGT 


GTCCTCCGAT 
CAGGAGGCTA 




351 


TGACTGAGTC 
ACTGACTCAG 


GCCCGGGTAC 
CGGGCCCATG 


CCGTGTATCC 
GGCACATAGG 


AATAAACCCT 
TTATTTGGGA 


CTTGCAGTTG 
CAACGTCAAC 




401 . 


CATCCGACTT 
GTAGGCTGAA 


GTGGTCTCGC 
CACCAGAGCG 


TGTTCCTTGG 
ACAAGGAACC 


GAGGGtCTCC 
CTCCCAGAGG 


TCTGAGTGAT 
AGACTCACTA 




451 


TGACTACCCG TCAGCGGGGG TCTTTCATTT GGGCGCTCGT CCGGGATCGG 
ACTGATGGGC AGTC6CCCCC AGAAA6TAAA CCCCCGAGCA GGCCCTAGCC 




501 


GAGACCCCTG 
CTCTGGGGAC 


CCCAGGGACC 
GGGTCCCTGG 


ACCGACCCAC 
TGGCTGGGTG 


CACCGGGAGG 
GTGGCCCTCC 


CAAGCTGGCC 
GTTCGACCGG 




551 


AGCAACTTAT CTGTGTCTGT .CCGATTGTCT AGTGTCTATG 
TCGTTGAATA GACAqVGACA GCCTAACAGA TCACAGATAC 


ACTGATTTTA 
TGACTAAAAT 




601 


TGCGCCTGCG 
ACGCGGACGC 


tCGGTACTAG 
AGCCATGATC 


TTAGCTAACT 
AATCGATTGA 


AGCTCTGTAT 
TCGAGACATA 


CTGGCGGACC 
GACCGCCTGG 




651 


CGTGGTGGAA 
GCACCACCTT 


CTGACGAGTT 
GACTGCTCAA 


CTGAACACCC 
GACTTGTGGG 


GGCCGCAACC 
CCGGCGTTGG 


CTGGGAGACG 
GACCCTCTGC 




701 


TCCCAGGGAC 
AGGGTCCCTG 


TTTGGGGGCC 
AAACCCCCGG 


GTTTTTGTGG COCGACCTGA 
CAAAAACACC GGGCT^ACT 


GGAAGGGAGT 
CCTTCCCTCA 




751 


CGATGTOGAA 
GCTACACCTT 


TCCGACCCCG 
AGGCTGGGGC 


TCAGGATATG 
ACTCCTATAC 


TGGTTCTGGT 
ACCAA6ACCA 


AGGAGACGAG 
TCCTCTGCTC 




801 


AACCTAAAAC 
TTGGATTTTG 


AGTTCCCGCC 
TCAAGGGCGG 


TCCGTCTGAA 
AGGCAGACTT 


TTTTTGCTTT 

AAAAACGAAA 


CGGTTTGGAA 
GCCAAACCTT 




851 


CCGAAGCCGC 
GGCTTCGGCG 


GCGTCTTGTC 
CGCAGAACAG 


TGCTGCAGCA 
ACGACGTCGT 


TCGTTCTGTG 
AGCAAGACAC 


TTGTCTCTGT 
AACAGAGACA 




901 


CTGACTGTGT 
GACTGACACA 


TTCTGTATTT 
AAGACATAAA 


GTCTGAAAAT TAGGGCCAGA 
CAGACTTTTA ATCCCGGTCT 


CTGTTACCAC 
GACAATGGTG 
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931 TCCCTTAAGT TTGACCTTAG. GTAACTGGAA AGATCTCGAG CGGCTCGCTC 
AGGGAATTCA AACTGGAATC CATTGACCTT TCTACAGCTC GCCGAGCGAG 



1001 ACAACCAGTC GGTAGATGTC AAGAAGAGAC GTTGGGTTAC CTTCTGCTCT 
TGTTGGTCAG CCATCTACAG TTCTTCTCTG CAACCCAATG GAAGACGAGA 



lOSl GCAGAATGGC CAACCTTTAA CGTCGGATGG CCGCGAGACG GCACCTTTAA 
CGTCTTACCG GTTGGAAATT GCAGCCTACC GGCGCTCTGC CGTGGAAATT 



1101 CCGAGACCTC ATCACCCAGG TTAAGATCAA GGTCTrTTCA CCTGGCCCGC 
GGCTCTGGAG TA6TGGGTCC AATTCTAGTT CCAGAAAAGT GGACCGGGCG 



1151 ATGGACACCC AGACCAGGTC CCC7ACATCG TGACCTGGGA AGCCTTGGCT 
TACCTGTGGG TCTGGTCCAG GGGATGTAGC ACTGGflCCCT TCGGAACCGA 



1201 TTTGACCCCC CTCCCTGGGT CAACCCCTTT GTACACCCTR AGCCTCCGCC 
AAACTGGGGG GAGGGACCCA GTTCGGGAAA CATGTGGGAT TCGGACGCGC 



1251 TCCTCTTCCT CCATCCGCCC CGTCTCTCCC CCTTGAACCT CCTCGTTCGA 
AGGAGAAGGA GGTAGGCGGG GCAGAGAGGG GGAACTTGGA GGAGCAAGCT 



1301 CCCCGCCTCG ATCCTCCCPT TATCCAGCCC TCACTCCTTC TCTAGGCGCC 
GGGGCGGAGC TAGGAGGGAA ATAGGTCGGG AGTGAGGAAG AGATCCGCGG 



135X GGCCGCTCTA GCCCATTAAT ACGACTCACt' ATAGGGCGAT TCGAACACCA 
CCCGCGAGAT CGGGTAATTA TGCTGAGTGA TArCCCGCTA AGCTTGTGGT 



1401 TGCACCATCA TCATCATCAC GTCGACGAAC AGAAACTCAT TTCCGAAGAA 
ACGTGGTAGT agtagtagtg cagctgcttg TCTTTGAGTA AAGGCTTCTT 



1451 gacctactcg agatgggcgt gattacggat tcactggccg tcgttttaca * 
ctggatgagc tctacccgca ctaatgccta agtgaccggc agcaaaatgt 



1501 ACGTCGTGAC TGGGAAAACC CTGGCGTTAC CCAACTTAAT CGCCTTGCAG 
TGCAGCACTG ACCCTTTTGG GRCCGCAATG GGTTGAArCA GCGGAACGTC 



1551 CACATCCCCC TTTCGCCAGC TGGCGTAATA GCGAAGAGGC CCGCACCGAT 
GTGTAGGGGG AAAGCGGTCG ACCGCATTAT CGCTTCTCCG GGCGTGGCTA 



1601 CGCCCTTCCC AACAGTTACG CAGCCTGAAT GGCGAATGGC GCTTTGCCTG 
GCG6GAAGGG TTGTCAATGC GTCGGACTTA CCGCTTACCG CGAAACGGAC 



1651 GTTTCCGGCA CCAGAAGCGG TGCCGGAAAG CTGGCTGGAG TGCGATCTTC 
CAAAGGCCGT GGTCTTCGCC ACGGCTTTTC GACCGACCTC ACGCTAGAAG 



1701 CTGAGGCCGA TACTGTCGTC GTCCCCTCAA ACTGGCAGAT GCACGCTTAC 
GACTCCCGCT ATGACAGCAG CACCCGAGTT TGACCGTCTA CGTGCCAATG 



1751 GATGCGCCCA TCTACACCAA CGTGACCTAT CCCATTACGG TCAATCCGCC 
CTACGCG6GT AGATGTGGTT GCACTGGATA GGGTAATGCC AGTTAGGCGG 



1801 GTTTGTTCCC ACGGAGAATC -CGACGGGXTG TTACTCGCTC ACATTTAATG 
CAAACAAGC3G TGCCTCTTAG GCTGCCCAAC AATGAGCGA6 TGTAAATTAC 



1851 TTGATGAAAG CTGGCTACAG GAAGGCCAGA CGCGAATTAT TTTTGATGGC 
AACTACTTTC GACCGATGTC CTTCCGGTCT GCGCTTAATA AAAACTACCG 
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1901 GTTAACTCGG CGTTTCATCT GTGGTGCAAC GGGCGCTGGG TCGGTTACGG 
CAATTGAGCC GCAAAGTAGA CACCACGTTG CCCGCGACXX AGCCAATGCC 



1 Q CI 


CCAGGACAGT CGTTTGCCGT CTGAATTTGA CCTGAGCGCA 
GGTCCTGTCA GCAAACGGCA GACTTAAACT GGAC7CGCGT 


TTTTTACGCG 
AAAAATGCGC 


2001 


CCGGAGAAAA 
GGCCTCTTTT 


CCGCCTCGCG GTGATGGTGC TGCGCTGGAG 
GGCGGAGCGC CACTACCACG ACGCGACCTC 


TGACGGCAGT 
ACTGCCGTCA ^ 


. 2051 


TATCTGGAAG 
ATAGACCTTC 


ATCAGGATAT GTGGCGGATG AGCGGCAXTT TCCGTGACGT 
TAGTCCTATA CACCGCCTAC TCGCCGTAAA AGGCACTGCA 


2101 


CTCGTTGCTG 
GAGCAACGAC 


CATAAACCGA CTACACAAAT CAGCGATTTC 
GTATTTGGCT GATGTGTTTA GTCGCTAAAG 


CATGTTGCCA 
GTACAACGGT 


2151 


CTCGCTTTAA 
GAGCGAAATr 


TGATGATTTC AGCCGCGCTG TACTGGAGGC 
ACTACTAAAG TCGGCGCGAC ATGACCTCCG 


TGAAGTTCAG 
ACTTCAAGTC 


2201 


ATGTGCGGCG 
TACACGCCGC 


AGTTGCGTGA CTACCTACGG GTAACAGTTT 
TCAACGCACT GATGGATGCC CATTGTCAAA 


CTTTATGGCA 
GAAATACCGT 


2251 


GGGTGAAACG CAGGTCGCCA GCGGCACCGC GCOTTCC3GC 
CCCACTTTGC GTCCAGCGGT CGCCGTGGCG CGGAAAGCCG 


GGT6AAATTA 
CCACTTTAAT 


2301 


TCGATGAGCG 
AGCTACTCGC 


TGCrGGTTAT GCCGATCGC6 TCACACTAC6 
ACCACCAATA CGGCTAGCGC AGTGTGArCC 


TCTGAACGTC 
AGACTTGCAG 


2351 


GAAAACCCGA 
CTTTTGGGGT 


AACTGTG6AG CGCCGAAATC CCGAATCrCT 
TTGACACCTC GCGGCTTTAG GGCTTAGAGA 


ATCGTGCGGT 
TAGCACGCCA 


2401 


GGTTGAACTG 
CCAACTTGAC 


CACACCGCCG ACGGCACGCT GATTGAAGCA 
GTGTGGCGGC TGCCGTGCGA CTAACTTCGT 


GAAGCCTGCG 
CTTCGGACGC 


2451 


ATGTCGGTTT 
TACAGCCAAA 


CCGCGAGGTG CGGATTGAAA ATGGTCTGCT 
GGCGCTCCAC GCCTAACTTT TACCAGACGA 


GCTGCTGAAC 
CGACGACTTG 


2501 


GGCAAGCCGT 
CCGTTCGGCA 


TGCTGATTCG AGGCGTTAAC CGTCACGAGC 
ACGACTAAGC TCCGCAATTG GCAGTGCTCG 


ATCATCCTCT 
TAGTAGGAGA* 


2551 


GCATGGTCAG 
CGTACCAGTC 


GTCATGGATG AGCAGACGAT GGTGCAGGAT 
CAGTACCTAC TCGTCTGCTA CCACGTCCTA 


ATCCTGCTGA 
TAGGACCACT 


2601 


TGAAGCAGAA 
ACTTCGTCTT 


CAACTTTAAC GCCGTGC6CT GTTCGCArXA 
GTTGAAATTG CGGCACGCGA CAAGCGTAAT 


TCOCAACCAr 
AGGCTPGGTA 


2651 


CCGCTGTGGT 
GGCGACACCA 


ACACGCTGTG CGACCCCTAC GGCCTGTATG 
TGTGCGACAC GCTGGCGATG CCGGACATAC 


rGGTGGATGA 
ACCACCTACT 


2701 


AGCCAATATT 
TCGGTTATAA 


GAAACCCACG GCATGGTGCC AATGAATCGl 
CTTTGGGTGC CGTACCACGG TTACTTAGCA 


CTGACCGATG 
GACTGGCTAC 


2751 


ATCCGCGCTG 
TAGGCGCGAC 


GCTACCGGCG ATGAGCGAAC GCGTAACGCG 
CGATGGCCGC TACTCGCTTG CGCATTGCGC 


AATGGTGCAG 
TXACCACGTC 


2801 


CGCGATCGTA 
GC6CTAGCAT 


ATCACCCGAG TGTGATCATC TGGTCCCTGG 
TAGTGGGCTC ACACTAGTAG ACCAGCGAOC 


GGAATGAATC 
CCTTACTTAG 
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235L 


AGGCCACGGC 
TCCGGTGCCG 


GCTAATCACG 
CGATTAGTGC 


ACGCGCTGTA TCGCTGGATC 
TGCGCGACAT AGCGACCIAG 


AAATCTGTCG 
TTTAGACAGC 




2901 


ATCCTTCCCG 
TAGGAAGGGC 


CCCGGTGCAG 
GGGCCACGTC 


TATGAAGGCG GCGGAGCCGA 
ATACTTCCGC CGCCTCGGCT 


CACCACGGCC 
GTGGTGCCGG 




2951 


ACCGATATTA 
TGGCTATAAT 


TTTGCCCGAT 
AAACGGGCTA 


GTACGCGCGC GTGGATGAAG 
CATGCGCGCG CACCTACTTC 


ACCAGCCCTT 
TGGTCGGGAA 




3001 


CCCGGCTGTG 
GGGCCGACAC 


CCGAAATGGT 
GGCTTTACCA 


CCATCAAAAA ATGGCTTTCG 
GGTAGTTTTT TACCGRAP.GC 


CTACCTGGAG 
GATGGACCTC 




3051 


AGACGCGCCC 
TCTGCGCGGG 


GCTGATCCTT 
CGACTAGGAA 


TGCGAATACG CCCACGCGAT 
ACGCTTATGC GGGTGOGCTA 


GGGTAACAGT 
CCCATTGTCA 




3101 


CTTGGCGGTT 
GAACCGCCAA 


TCGCTAAATA 
AGCGATTTAT 


CTGGCAGGCG TTTCGra>.GT 
GACCGTCCGC AAAGCAGTCA 


ATCCCCGTTT 
TAGGGGCAAA 




3151 


ACAGGGCGGC 
TGTCCCGCCG 


TTCGTCTGGG 
AAGCAGACCC 


ACTGGGTGGA TCAGTCGCTG 
rCACCCACCT AGTCAGCGAC 


ATTAAATATG 
TAATTTATAC 




3201 


ATGAAAACGG 
TACTTTTGCC 


CAACCCCTGG 
GTTGGGCACC 


TCGGCTTACG GCGGTGAtTT 
ACX:CGAATGC CGCCACTPAA 


TGGCGATACG 
ACCGCTATGC 




3251 


CCGAACGAXC 
GGCrrGCTAG 


GCCAGTTCTG 
CGGTCAAGAC 


TATGAACGGT CTGGTCTTTG 
ATACTTGCCA GACCAGAAAC 


CCGACCGCAC 
GGCTGGCGTG 




3301 


GCCGCArCCA 
CGGCGTAGGT 


GCGCTGACGG 
CGCGACTGCC 


AAGCAAAACA CCAGCAGCAG 
TTCGTTTTGT GGTCGrCGTC 


TTTTTCCAGT 
AAAAAGGTCA 




3351 


TCCGTTTATC 
AGGCAAATAG 


CGGGCAAACC 
GCCCGTTTGG 


ATCGAAGTGA CCAGCGAATA 
TAGCTTCACT GGTCGCTIAT 


CCTGTTCCGT 
GGACAAGGCA 




3401 


CATAGCGATA 
GTATCGCTAT 


ACGAGCTCCT 
TGCTCGAGGA 


GCACTGGATG GTGGCGCIGG 
CGrCACCTAC CACCGCGflCC 


ATGGTAAGCC 
TACCATTCGG 




3451 


GCTGGCAAGC 
CGACCGTTCG 


GGTGAAGTGC 
CCACTTCACG 


OrCTGGATGT CGCTCCACAA 
GAGACCTACA GCGAGGTGTT 


GGTAAACAGT 
CXATTTGrCA 




3501 


TGATTGAACT 
ACTAACTTGA 


GCCTGAACTA 
CGGACTTGAT 


CCGCAGCCGG AGAGCGCCGG 
GGCGTCGGCC TCTCGCGGCC 


GCAACTCTGG 
CGTTGAGACX: 




3551 


CTCACAGTAC 
GAGTGtCATG 


GCGTAGTGCA 
CGCATCACGT 


ACCGAACGCG ACCGCAItUST 
TGGCTTGCGC TGGCGTACCA 


CAGAAGCCGG 
GTCTTCGGOC 




3601 


GCACATCAGC 
CGTGTAGTCG 


GCCTGGCAGC 
CGGACCGTCG 


AGrGGCGTCT GGCGCAAAAC 
TCACCGCAGA CCGCCtTTTG 


CTCACTGTGA 
GAGTCACACT 




3651 


CGCTCCCCGC 
GCGAGGGGCG 


CGCGTCCCAC 
GCGCAGGGTG 


GCCATCCCGC ATCTGACCAC 
CGGTAGGGCG TAGACTGGTG 


CAGCGAAATG 
GTCGCTTTAC 




3701 


GATTTTTGCA 
CTAAAAACGT 


TCGAGCTGGG 
AGCTCGACCC 


TAATAAGCGT TGGCAArTTA 
ATTATTCGCA ACCGTTAAAT 


ACCCCCAGrC 
TGGCGGTCAG 




3731 


AGGCTTTCrr 
TCCGAAAGAA 


TCaCAGflTGr 
AGTGTCTACA 


GGAfTGGCGA TAAAAAACAA. CTGCTGACGC 
CCTAACCGCT ATTTTTTGTT GACGACTGCG 
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3801 CGCTGCGCGA TCAGTTCACC CGTGTCGATA GATCTGGAGG TGGTGGCAGC 
GCGACGCGCT AGTCAAGTGG 6CACAGCTAT CTAGACCTCC ACXACCGTCG 



3851 AGGCCTTGCC GCGCCCGATC CTTAATTAAC AATTCACCGG TAATAATAGG 
TCCGGAACCG CGCGGCCTAG GAATTAATTG TTAACTGGCC ATTATTATCC 



3901 TAGATAAGTG ACTGATTAGA TGCATTTCGA CTAGATCCCT CGACCAATTC 
ATCTATTCAC TGACTAATCT ACGTAAAGCT GATCTAGGGA GCTGGTTAAG 



3951 CGGTTATTTt CCACCATATT GCCGTCTTTT GGCAATGTGA GGGCCOGGAA 
GCCAATAAAA GGTGGTATAA CGGCAGAAAA CCGTTACACT CCCGGGCCTT 



4001 ACCTGGCCCr GTCTTCTTGA CGAGCATTCC TAGCCGTCTT TCCCCTCTCG 
TGGACCGGGA CAGAAGAACT GCTCGTAAGG ATCCCCAGAA AQGGGAGAGC 



4 051 CCAAAGGAAT GCAAGGTCTG TTGAATGTCG TGAAGGAAGC AGTTCCTCTG 
GGTTTCCTTA CGTTCCAGAC AACTTACAGC ACTTOCTTCG TCAAGGAGAC 



4101 GAAGCTTCTT GAAGACAAAC AACGTCTGTA GCGACCCTTT GCAGGCAGCG 
CTTCGAAGAA CTTCTGTTTG TTGCAGACAT CGCTGGGAAA CGTCCGTCGC 



4151 GAACCXCCCR CCrCGCGACA GGTGCCTCTG CGGCC3UUUVG CCACGTGTAT 
CTTGGGGGGT GGACCGCTCT CCACGCAGAC GCCGGTTTTC GGTGCACATA 



4 20i AAGATACACC TGCAAAGGCG GCRO^ACCCC ACtGCCACGT T6TGAGTTG6 
TTCTATGTGG ACGTTTCCGC CGTGTTGGG6 TCACGGTGCA ACACTCAACC 



4251 ATAGTTGTGG AAAGAGTCAA ATGGCTCTCC TCAAGCGTAT TCAACAAGGG 
. TATCAACACC TTTCTCAGTT TACCGAGftGC AGTTCGCATA AGTTGTTOCC 



4 301 GCTGAAGGAT CCCCAGAAGG lACCCCATTG TATGGGATCT GATCTGGQGC 
CGACTTCCTA CGGGTCTTCC ATGGGGTAAC ATACCXTTAGA CTAGACCCCG 



4351 CTCGGTGCAC ATGCTTTACA TGTGTTTAGT CGAGGTTAAA AAACGTCTAG 

GAGCCACGTC TACGAAATGT ACACAAATCA GCTCCAATrT TTTGCAGATC 

4401 GCCCCCCGAA CCACGGGGAC GTGGTTTTCC TTTGAAAAAC ACGATGATAA 

CGGGGGGCTT GGTGCCCCTG CACCAAAAGG AAACTTTTTG TGCTACTATT 



4451 TACCATGAAA AAGOCTGAAC TCACCGCGAC GXCTCTCCAG AAGTTTCTGA. 
ATGCTACXTT TTCGGACTTG AGTGGCGCTG CAGACAGCTC TTC3WU;ACr 



4501 TCGAAAAGTT CGACAGOGTC TCCGACCTGA TGCAGCrCTC GGAGGGCGAA 
AGCTTTTCAA GCTGTCGCAG AGGCTGGACT ACGTCGfliaAG CCTCCCGCTT 



4551 GAATCTCGTG CTTTtAQCTT CGATGTAGGA GGGCGTGGAT ATGTCCTGCG 
CTTAGAGGAC GAAAGTCGAA GCTACATCCT CCCGCACCTA TACAGGACGC 



4 601 GGTAAATAGC tgcgccgatg gtttctacaa agatcgttat gtttatcggc 
ccatttatcg acgcggctac caaagatgtt tctagcaata caaatagccg 



4651 ACTITGCATC GGCCGCGCTC CCGATTCCGG AAGTGCTTGA CATTGGGGAA 
TGAAACGTAG CCGGC6C6AG GGCTAAGGCC TTCACGAACT GTAACCCCTT 



4701 TTTAGCGAGA GCCTGACCTA TTGCATCTCC CGCCGTGCAC AGGGTGTCAC 
AAATCGCTCT CGGACTGGAT AACGTAGAGG GCGGCAC6TG TCCCACAGTG 



52/71 



wo 01/58923 



PCT/USOl/00684 



^751 GTTGCAAGAC CTGCCTGAAA CCGAACTGCC CGCTGTTCTG CAGCCGGTCG 
CAACGTTCTG GACGGACTTT GGCTTGACGG GCGACAAGAC GTCGGCCAGC 



4 801 CGGAGGCCAT 'gGATGCGATC GCTGCGGCCG ATCTTAGCCA GACGAGCGGG 
GCCTCCGGTA CCTACGCTAG CGACGCCGGC TA6AATCGGT CTGCTCGCCC 



4 851 TTCGGCCCAT TCGGACCGCA AGGAATCGGT CRATACACTA CATGGCGTGA 
AAGCCGGGTA AGCCTGGCGT TCCrTAGCCA GTTATGTGAT GTACCGCACT 



4901 TTTCATATGC GCGATTGCTG ATCCCCATGT GTATCACTGG CAAACTGTGA 
AAAGTATACG CGCTAACGAC TAGGGGTACA CATAGTGACC GrXTGACACT 



4951 TGGACGACAC CGTCAGTGCG TCCGTCGCGC AGGCTCTCGA TGAGCTGATG 
ACCTGCTGTG GCAGTCACGC AGGCAGCGCG TCCGAGAGCT ACTCGACTAC 



5001 CTTTGGGCCG AGGACTGCCC CGAAGTCCGG CACCTCGTGC ACGCGGATTT 
GAAACCCGGC TCCTGACGGG GCTTCAGGCC GTGGAGCACG TGCGCCTAAA 



50 SI CGGCTCCAAC AATGTCCTGA CGGACAATGG CCGCATAACA GCGGTCATTG 
GCCGAGGTTG TTACAGGACT GCCrCTTACC GGCGTATTGT CGCCAGTAAC 



5101 ACTGGAGCGA GGCGATGTTC GGGGATTCCC AATACGAGGT CGCCAACATC 
TGACCTCGCT CCGCTACAAG CCCCTAAGGG TTATGCTCCA GCGGTTGTAG 



5151 TTCTTCTGGA GGCCGTGGTT GGCVIGTATG GAGCAGCAGA CGCGCTACTT 
AAGAAGACCT CCGGCACCAA CCGAACATAC CTCGTCGTCT GCGCGATGAA 



5201 CGAGCGGAGG CATCCGGAGC TTGCAGGATC GCCGCGGCTC CGGGCGTATA 
GCTCGCCTCC GTAGGCCTCG AACGTCCTAG CGGCGCCGAC GCCCGCATAT 



52S1 TGCTCCGCAT TGGTCTTGAC CAACTCTATC AGAGCTTGGT TGACGGCAAT 
ACGAGGCGTA ACCAGAACTG GTTGAGAtAG TCTCGAACCA ACTGCCGTTA 



5301 TTCGATGATG CAGCTTGGGC GCAGGGTCGA TGCGACGCAA TCGTCCGATC 
AAGCTACTAC GTCGAACCCG CGTCCCAGCT ACGCTGCGTT AGCAOGCTAG 



5351 CGGAGCCGGG ACTGTCGGGC GTACACAAAT CGCCCGCAGA AGCGCGGCCG 
GCCTCGGCCC TGACAGCCCG CArGTGTTTA GCGGGCGTCT TCGCGCOGGC 



5401 TCTGGACCGA TGGCTGTGTA GAAGTACTCG CCGATAGTGG AAACCGACGC 
AGACCTGGCT ACCGACACAT CTTCATGAGC GGCTATCACC TTTGGCTGCG 



5451 CCCAGCACTC GTCCGAGGGC AAAGGAATAG AGTAGATGCC GACCGCGATC 
CiGGTCGTGAG CAGGCTCCCG TTTCCTTATC TCATCTACGG CTGGCCCTAG 



5501 TATC6ATAAA ATAAAAGATT TTATT7AGTC TCCAGAAAAA GGGGGGAATG 
ATAGCTATTT TATTTtCTAA AATAAATCAG AGGTCTTTTT CCCCCCTTAC 



5551 AAAGACCCCA CCTGTAGGTT TGGCAAGCTA GCTTAAGTAA CGCCATTTTG 
TTTCTGGGGT GGACATCCAA ACOGTTCGAT CGAATTCATT GCGGTAAAAC 



5601 CAAGGCATGG AAAAATACA7 AAC7GAGAAT AGAGAAGTTC AGA7CAAGGT 
GTTCCGTACC TTTTTATGTA TTGACTCTTA TCTCTTCAAG TCTAGTTCCA 



5651 CAGGAACAGA TGGAACAGCT GAATATGGGC CAAACAGGAT ATCTGTGGTA 
GTCCirGTCT ACCTTGTCGA CTTATACCCG 6TTTGTCCTA TAGACACCAT 
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5T01 AGCAGTTCCT GCCCCGGCTC AGGGCCAAGA ACAGATGGAfl CAGCTGAATA 
TCGTCAAGGA CGGGGCCGAG TCCCGGTTCT TGTCTACCTI GTOSACTTAT 



ST 51 TGGGCCAAAC AGGATATCTG TGCTAAGCAG TTCCTGCCCC GGCTC3\GGGC 
ACCCGGTTTG TCCTATAGAC flCCATTCGTC AAGGflCGGGG CCGAGTCCCC 



5801 CAAGAACAGA TGGTCCCCAG ATGCGGTCCA GCCCTCAGCA GTTTCTAGAG 
GTTCTTGTCT ACCAGGGGTC TACGCCAGGT CGGGAGTCG7 CAAAGATCTC 



5851 AACCATCAGA TGTTTCCAGG GTGCCCCAAG GACCTGAAAT GACCCTGTGC 
TTGGTAGTCT ACAAAGGTCC CACGGGGTTC CTGGACmA CTGGGACaCG 



5901 CTTATTTGAA CTAACCAATC AGTTCGCTTC TCGCTTCTGT TCGCGCGCTT 
GAATAAACTT GATTGGTTAG TCAA6CGAAG AGCGAAGACA AGCGCGCGAA 



5951 CTGCTCCCCG AGCTCAATAA AAGAGCCCAC AACCCCTCAC TCGGGGCGCC 
GACGAGGGGC TCGAGTTATT TTCTCGGGTG TTGGGGAGTG AGCCCCGCGG 



6001 AGTCCTCCGA TTGACTGAGT CGCCCGGGTA CCCGTGTATC CAATAAACCC 
TCAGGAGGCT AACTGACTCA GCGGGCCCAT GGGCACATAG GTTATTTGGG 



6051 TCTTGCAGTT GCATCCGACT TGTGGTCTCG CTGTTCCTTG GGAGGGTCTC 
AGAACGTCAA CGTAGGCTGA ACACCAGAGC GACAAGGAAC CCTCCCAGAG 



6101 CTCTGAGTGA TTGACTACCC GTCAGCGGGG GTCTTTCATT CATGCAGCAT 
GAGACTCACT AACTGATGGG CAGTCGCCCC CAGAAAGTAA GTAC6TCGTA 



6151 GTATCAAAAT TAATTTGGTT TTTTTTCTTA AGTATTTACA TTAAATGGCC 
CATAGTTTTA ATTAAACCAA AAAAAAGAAX TCATAAATGT AATTTACCGG 



6201 ATAGTTGCAT TAATGAATCG GCCAACGCGC GGGGAGAGGC G6TTTGCGTA 
TATCAACGTA ATTACTTAGC CGGTTGCGCG CCCCTCTCCG CCAAACGCAT 



6251 TTGGCGCTCT TCCGCTTCCT CGCTCACTGA CTCX3CTGCGC TCGGTCGTTC 
AACCGCGAGA AGGCGAAGGA GCGAGTGACT GAGCGACGCG AGCCAGCAAG 



6301 GGCTGCGGCG AGCGGTATCA GCTCACTCAA AGGCGGTAAT ACGGTTATCC 
CCCACGCCCC TCGCCATAGT CGAGTGAGTT TCCGCCATTA TGCCAATAGG* 



6351. ACAGAATCAG GGGATAACGC AGGAAAGAAC ATGTGAGCAA AAGGCCAGCA 
TGTCTTAGTC CCCTATTCCG TCCTTTCTTG TACACTCGTT TTCCGCTCGT 



•6401 AAAGGCCAGG AACCGTAAAA AGCCCGCGXr GafGGCGTTT TTCCATAGGC ' 
TTTCCG6TCC TTGGCATTTT TCCGGCGCAA CGACCGCAAA AAGGTATCCG 



6451 TCCGCCCCCC TGACGAGCAT CACAAAAATC OACGCTCAAG TCAGAGGTGG 
AGGC6GGGGG ACTGCTCGTA GTGTTTTTAG CTGCGAGTTC AGTCTCCACC 



6501 CGAAACCCGA CAGGACTATA AAGATACCAG GCGTTTCCCC CTGGAAGCTC 
GCTTTGGGCT GTCCTGATAT TTCTATGGTC CGCAAAGGGG GACCTTCGAG 



65S1 CCTCGTGCGC TCTCCTGTTC CGACCCTGCC GCTTACCGGA TACCTGTCCC 
GGAGCACGCG AGAGGACAAG GCTGGGACGG CGAATGGCCT ATGGACAGGC 



6601 CCTTTCTCCC TTCGGGAAGC GTGGCGCTTT CTCATAGCTC ACGCTGTAGG 
GGAAAGAGGG AAGCCCTTCG CACCGCGAAA-GAGTATCGAG TGCGACATCC 
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6651 


rATCTCAGTT 
ATAGAGTCAA 


CGGTGTAGGT CGTTCGCTCC AAGCTGGGCT GTGTGCACGA 
GCCACATCCA GCAAGCGAGG TTCGACCCGA CACACGTGCT 


6T01 


ACCCCCCGTT 
TGGGGGGCAA 


CAGCCCGACC GCTGCGCCTT ATCCGGTAAC TATC6TCTTG 
GTCGGGCTGG CGACGCGGAA TAGGCCATTC ATAGCAGAAC 




AGTCCAflCCC 
rCAGGTTGGG 


GGTAAGACAC GACTTATCGC CACTGGCAGC AGCCACTGGT 
CCATTCTGTG CTGAATAGCG GTGACCGTCG TCGGTGACCA 


6801 


AACAGGATTA 
TTGTCCTAAT 


GCAGAGCGAG GTATGTAGGC GGTGCTACAG AGTTCTTGAA 
CGTCTCGCTC CATACATCCG CCACGATGTC TCAAGAACTT 


€651 


GTGGTGGCCT 
CACCACCGGA 


AACTRCGGCT ACRCTAGAAG AACAGTATTT GGTATCTGCG 
TTGATGCCGA TGTGATCTTC TTGTCATAAA CCATAGACGC 


6901 


CTCTGCTGAA GCCAGTTACC TTCGGAAAAA GAGrTGGTAG CTCTTGATCC 
GAGACGACTT CGGTCAATGG AAGCCTTTTT CTCAACCATC GAGAACTAGG 


6951 


GGCAAACAAA 
CCGTTTGTTT 


CCACCGCTGG TAGCGGTGGT TTTTTTGrrT GCAAGCAGCA 
GGTGGCGACC ATCGCCACCA AAAAAACAAA CGTTCGTCGT 


7001 


GATTACGCGC 
CTAATGCGCG 


AGAAAAAAAG GATCTCAAGA AGATCC7TTG ATCTTTTCTA 
XCTTrTTTTC CTAGAGTTCT TCTAGGAAAC TAGAAAAGAT 


7051 


CGGGGTCTGA 
GCCCCAGACT 


CGCTCAGTGG AACGAAAACT CACGTTAAGG GATTTTGGTC • 
GCGAGTCACC TTGCTTTTGA GT6CAATTCC CTAAAACCAG 


7101 


. ATGAGAFTAT CAAAAAGGAT CTTCACCTAG ATCCTTrtTGC GGCCGCAAAT 
TACTCTAATA GTTTTTCCTA GAAGTGGATC TAGGAAAACG CCGGCGTTTA 


7151 


CAATCTAAAG 
GrrAGATTTC 


TATATATGAG TAAACTTGGT CTGACAGTTA CCAArGCTTA 
ATATATACTC ATTTGAACCA GACTGTCAAT GGTTACGAAT 


7201 


ATCAGTGAGG 
TAGTCACTCC 


CACCTATCTC A6CGATCTGT CTATTTCGTT CATCCATAGT 
GTGGATAGAG TCGCTAGACA GA7AAAGCAA GTAGGTATCA 


72SX 


TGCCTGACTC 
ACGGACTGAG 


COCGTCGTGT AGAtfiACTAC GATACGGGAG GGCTTACCAT 
GQGCAGCACA TCTATTGATG CTATGCXCTC CCGAATGGTA 


7301 


CTGGCCCCAG 
GACCGGGGTC 


TGCTGCAATG ATACCGCGAG ACCCACGCTC ACCGGCTCCA 
AOGACGTTAC TATGGCGCTC TGGGTGCGAG TGGCCGAGGT 


7351 


GATTTArCRG 
CTAAATAGTC 


CAATAAACCA GCCAGCCGGA AGGGCOGA6C GCAGAAGTGG 
GTTATTTGGT CGGTCGGCCT TCCCGGCTCG CGTCTTCACC 


7401 


TCCTGCAACT TTATCCGCCT CCATCCAGTC TATTAAITGT TGCCGGGAAG 
AGGACGTTCa AATAGCCGGA GGTAGGTCAG ATAATTAACA ACGGCCC»rTC 


7451 


CTAGAGTAAG 
CATCTCATTC 


TAGTTCGCCA GTTAATAGTT TGCGCAACCT TGTTGCCATT 
ATCAAGCGGT CAATTATCAA ACGCGTTGCA ACAACGGTAA 


7501 


GCTACAGGCA TC5TGGTCTC ACGCTCGTCG TTTGGTATGG CrTCATTCAC 
CGATGTCCGT AGCACCACAG TGCGAGCAGC AAACCATACC GAAGTAAGTC 


7551 


CTCCGGTTCC 
GAGGCCAAGG 


CAACGATCAA CGCGACTTAC ATGATCCXTCX: ATCTTGTGCA 
CTTCCTACTT CCCCTCAATG TACTAGOGGG TACAACACGT 
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7601 AAAAAGCGGT TAGCTCCTTC GGTCCTCCGA TCGTTGTCAG AAGTAAGTTG 
TTTTTCGCCA AtCGAGGAAG CCAGGAGGCT AGCAACAGTC TTCATTCAAC 



7651 GCCGCAGTGT TATCACTCAT GGTTATGGCA GCACTCCATA ATTCTCTTAC 
CGGCGTCACA ATAGTGAGTA CCAATACCGT CGTGACGTAT TAAGAGAATG 



7701 TGTCATGCCA TCCGTAAGAT GCTTTTCTGT GACTCGTGAG TACTCAACCA 
ACAGTACGGT AGGCATTCTA CGAAAAGACA CTGACCACTC AT6AGTTGGT 



7751 AGTCATTCIG AGAATAGTGT ATGCGGCGAC CGAGTTGCTC TTGCCCGGCG 
TCAGTAAGAC TCTTATCACA TACGCCGCTG GCTCAACGAG AACGGGCCGC 



7801 TCAATACGGG ATAATACCGC GCCACATAGC AGAACTTTAA AAGTGCTCAT 
AGTTATGCCC TATTATGGCG CGGTGTATCG TCTTGAAATT TTCACGACTA 



7851 CATTGGAAAA CGTTCTTCGG GGCGAAAACT CTCAAGGATC TTACCGCTGT 
GrAACCTTTT GCAAGAAGCC CCGCTTTTGA GAGTTCCTAG AATGGCGACA 



7901 TGAGATCC^G TTCGATGTAA CCCACTCGTG CACCXAACTG ATCTTCAGCA 
ACTCTAGGXC AAGCTACATT GGGTGAGCAC GTGGGTTGAC TAGAAGTCGT 



7951 TCTTTTACTT TCAtXACCGT T7CTGGGTGA GCAAAAACRG GAAGGCAAAA 
AGAAAATGAA AGIGCTCGCA AAGACXTCACT CGTTTTTGTC CTTCCGTTTT 



8001 TGCCGCAAAA AAGGGAATAA GGGCGRCACG GAAATGTTGA ATACTCATAC 
ACG(?CGTm TTCCCTTAXr CCXGCTGTGC CTTTACAACT TATGAGTATG 



8051 TCTTCCrmr rCAATATTAr TGAAGCATTT ATCAGGGTTA TTGTCTCATG 
AGAAGGAAAA AGTTATAATA ACTTOGTAAA TAGTCCCAAT AACAGAGTAC 



3101 AGCGGATACn TATTTGAATG TATTTAGAAA AATAAACAAA TAGGGGTTCC 
TCGCCTATGT ATAAACTTAC ATAAATCTTT TTATTTGTTT ATCCCCAAGG 



8151 GCGCACATTT C 
CGCGTGTAAA G 



56/71 



wo 01/58923 



PCT/US0i;00684 



5*MoMuLVLTR 




Figure 14 



57/71 



wo 01/58923 



PCT/USOl/00684 




58/71 



wo 01/58923 



PCT/US0i;00684 




Figure 16 



59/71 



wo 01/58923 



PCT/US0i;00684 



S'MoMuLVLTR 




Figure 17 . 
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Vector for. Expression of a GPCR with inserted 
Seronine/Threonine amino kcid sequences as a fusion with P-gal Aa. 
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Vector for Expression of mutant (R170E) p-arrestin2 as a fusion 
with P-gai Ag). 



FIGURE 25 
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